Original source: 0008-persistent-ledger-builder-controller.md

Summary

This adds a persistent backend for the ledger builder controller.

Motivation

Having a persistent backend will allow nodes that go offline, whether for a short period or an extended period of time, to catch back up to the network with as little work as reasonably possible.

Detailed design

In order to achieve this, we will persist the locked tip using the existing Merkle database. The backend Merkle database is designed in a way such that queries for the most commonly performed operations can be batched together as one request to the database.

The best tip, materialized by the ledger builder controller, will be represented as a Merkle ledger mask on top of the persistent Merkle ledger database. This data structure will only store the modified nodes of the Merkle ledger in memory, and defer to the contents of the persistent Merkle ledger database for any information outside of that. The persistent Merkle ledger database will maintain a reference to all of these data structures derived for it. These references will be used to garbage collect outdated information in the mask representations whenever changes are written to the persistent copy. In other words, information will be removed from the mask whenever it is no longer different from the information in the underlying database, ensuring that the data structure will not leak memory. If a mask is empty, it may need to be retained, in case there are existing references to it.

There may be several points between the locked tip and the best tip, and we may need to examine any such point. For performance reasons, we'll daisy-chain masks on top of one-another all throughout the tree. Whenever an underlying db updates, changes should only be propagated to the immediate mask-children of that DB. Because masking is associative, we can merge mask-children in with there parents without needing to notify any children. In RFC 0009, we take advantage of this property.

These representation choices involve trade-offs that should be considered. The Map representation contains just key and value data, while the prefix tree representation add summary hashes, which could consume more memory. To find account data with the Map, where the account does not appear in that mask, would mean first consulting the mask, which fails, then making an additional database lookup. That could be slower than a lookup in the prefix tree, where summary hashes could be used to guide the database lookup without a failure step.

There's an existing Sync functor that takes a database module, to perform catchup operations. The masks just described work also with a database, so a MakeMask functor that takes a database module would be a way to make this setup work. The database module that's passed in is not copyable, but both the Sync and Diff functors will offer a copy function to copy their local data, leaving the underlying database unaffected.

Later, a cache can be introduced onto the Merkle ledger database, allowing us to perform fewer database queries for commonly looked up information. The details of caching, however, is not addressed as part of this particular RFC. See Use Merkle mask as a proxy, which requests such a feature.

The module type for the mask could be:

module type Mask = sig
  type t
	type key
	type value

  type db = Database_intf.S.t

  (* group identity, operations *)
  val empty : t
  val add : t -> t -> t
	val inverse : t -> t

  val insert : key -> value -> t -> t
	val remove : key -> t -> t
  val from_list : (key * value) list -> t

  (* look up in diff; if key not present, delegate to database for locked tip *)
  val get : db -> key -> value option
end

Drawbacks

The drawback of this is that the persistent Merkle database is not easy to copy. However, this is by design, as there is only one copy that should be persisted as the central source of truth. Any other information which we wish to persist should live outside of this representation. For instance, if we want to persist the best tip, then only the diff between that tip's database and the locked tip's database should be stored. This is much more efficient for both storage and quick synchronizing, in the case of short term downtimes.

Rationale and alternatives

An alternative design is to restructure the persistent database so that it can be cheaply copied. This can be done by keying nodes not by their path in the Merkle tree and instead use keys that directly identify nodes, and only copy the nodes that are modified for the copied database. However, there are a few issues with this. For one, it makes many of the common queries to the data structure far less efficient. If you want to find a leaf given a path, you would need to make k successive requests to the database, where k is the depth of your tree, versus the single request of bulk lookups required by the other representation. This concern can be alleviated with more indices into the database, but those in turn also introduce more request requirements, and also increase the cost of garbage collection. The second reason is that garbage collection could prove challenging. The main logic behind when something is considered garbage is simple: keep a ref count for each node, and when it's zero, delete it. But, that adds cost to each operation on nodes; or, alternatively, requires us to perform collection cycles in our scheduler. This also becomes exponentially more complex as we add in indices to make the representation more efficient, which will be necessary. Thirdly, this representation adds even more accesses to the database, since you need to do all work with ledger through the database. This will make caching more important for efficiency, and if you are already going to represent the active part of the ledger in memory via a cache, why not just opt for the more efficient representation on the persistent side and use an in memory abstraction for representing the active part of a ledger.

An algebra of masks

It may be useful if there are operations on masks with nice mathematical properties. For example, there could be an addition operation on masks that results in the composition of those masks. To "undo" a mask, one could produce the negation of a mask. These ideas suggests an algebraic group, where the carrier set is the set of masks.