RFC 0031: Sentry Architecture

Original source: 0031-sentry-architecture.md

Summary

Introduce a new "hidden" and "sentry" mode to the daemon for building DoS-resilient network topologies.

Motivation

Block producer availability is important both for healthy consensus (blocks have a limited time to be broadcast before they are no longer useful) and also for node operator profits. Block producers (and, to a lesser extent, SNARK workers) need to be powerful enough to produce their proofs in a reasonable amount of time, and have operational security requirements around protecting the private key. If a DoS takes a node offline such that they can't broadcast their work, or put the CPUs to use, the operator loses money and consensus is weakened. It is also risky to directly expose a machine with sensitive key material to untrusted internet traffic. Sensitive nodes can be protected from these risks by running them in a new "hidden mode". These hidden nodes will communicate via nodes running in "sentry mode".

The recommended way to deploy this would be to have several sentries, possibly in different cloud providers/datacenters, configured per hidden node.

Detailed design

New RPCs:

Sentry_forward: sent from a hidden node to a sentry node with some new work, to be broadcast over the gossip net.
Please_sentry: sent from a hidden node to its sentries periodically as a sort of "keepalive" to ensure the sentry node continues forwarding messages even if it crashes and forgets about us. These should be sent at least once per slot.
Hidden_forward: sent from a sentry node to a hidden node on all new gossip.

Hidden Mode

Nodes in hidden mode will not participate in discovery at all, by never calling begin_advertising. They will also filter out all connections not from their configured sentries. These are "soft" mitigations - hidden nodes, when deployed, should not be publicly routable at all, or otherwise have the firewall carefully configured to only allow communications with sentries.

Because hidden nodes are not well connected, we can't rely on the usual guarantees of the pubsub implementation to receive block gossip. Thus Hidden_forward: instead of using subscribe_encode, new messages will be received from this RPC.

The sentries are configured with a new --sentry-address MULTIADDR daemon flag. On startup and on a timer, hidden nodes will send Please_sentry to their configured sentries.

Sentry Mode

Sentry nodes are normal participants in networking. When they receive a new message, they forward it to connected hidden nodes using Hidden_forward.

Hidden nodes can be configured with a --sentry-for MULTIADDR daemon flag, which will ensure they always receive new messages. In addition, sentries will accept Please_sentry RPCs from any IP address in an RFC1918 private address range and add them to the sentry-for list. This allows hidden nodes to be on unstable private addresses without having to reconfigure the sentries.

Drawbacks

Not using pubsub requires some additional code.
This unavoidably adds (at least) one hop of latency before the block producers will see new blocks from the network.

Rationale and alternatives

Instead of having a hidden mode, we could simply operate normally and configure the network filter to not allow non-sentry connections. This, however, requires a custom pubsub (see below), and is somewhat wasteful: hidden nodes shouldn't need to care about the DHT at all.
We could have a custom pubsub implementation, which preferentially forwards to hidden nodes before hitting (eg) randomsub. This requires writing some Go code, versus regular OCaml RPCs as described.
Instead of Please_sentry, we could monitor disconnects and then busy poll until the sentry comes back online. We probably should do this, at some point, as it's more precise.

Prior art

Similar to the Cosmos Hub architecture.

Unresolved questions

Should we use local mDNS discovery to find sentries?

Summary​

Motivation​

Detailed design​

Hidden Mode​

Sentry Mode​

Drawbacks​

Rationale and alternatives​

Prior art​

Unresolved questions​