RFC 0047: Versioning Changes For Hard Fork
Original source: 0047-versioning-changes-for-hard-fork.md
Summary
Remove unneeded version tags from Bin_prot-serialized data produced by
ppx_version. Generate version tags on an as-needed basis where they're useful.
Detect versioning violations in the CI version linter in a more useful way.
Motivation
Versioned data types are built from other versioned types and primitive OCaml
types. When generating Bin_prot-serialized data from instances of those type,
a version tag is generated not only for the containing type, but also for each
constituent type. That strategy allows determining the version of any piece of
the serialized data.
The Jane Street Async versioned RPC mechanism used in Mina_networking
already provides versioning of data sent between nodes. Therefore,
version-tagging of the serialized data is not needed for that use case. Removing
the version tags will reduce the amount of data sent over the network.
Some precomputed blocks from Mainnet show the amount of data currently used by version tags:
Bin_io size Number of tags Proportion
9956 930 9%
1682720 121776 7% 1826633 131088 7% 1803712 129340 7%
We may wish to save serialized data and be able to determine its version. For that use case, a single "top" tag for the containing type suffices.
Some user-visible data, such as public keys, are formatted using the current serialization mechanism, where all contained types have version tags. For those few cases, allow use of the existing mechanism.
The CI version linter currently compares changes to versioned types against the PR base branch. That's more fine-grained than needed, if the goal of preventing version changes is to allow interoperability with deployed software. Instead, compare the types against the branch of the latest released software, as well as the base branch. That strategy will reduce the amount of noise produced by the linter, and provide more useful results.
Detailed design
Version-tagging changes
Currently, tagging a Stable module with %versioned produces shadowing
bin_read_t and bin_write_t functions that read and write version tags,
positive integers, prepended to serialized data, via the ppx_version library.
The proposal is not to shadow those functions, and instead use those generated
by deriving bin_io, directly. It is still useful to use %versioned to assure
that serializations do not change for a given versioned type.
For backwards-compatibility, we can add the annotation
[@@@with_all_version_tags]
to the legacy Stable.Vn modules. For each such module, generate a module
within it:
module With_all_version_tags = struct
type t = ...
let bin_read_t = ... bin_read_t ...
let bin_write_t = ... bin_write_t ...
...
end
where the shadowing functions are generated as they are today. For this case,
the constituent types must also have the same annotation, which is enforced by
the construction of the type t: the constituent types of t are of the form
M.Stable.Vn.With_all_version_tags.t.
Within Stable modules, we can have the annotation
[@@@with_top_version_tag]
to generate:
module With_top_version_tag = struct
type t = ...
let bin_read_t = ... bin_read_t ...
let bin_write_t = ... bin_write_t ...
...
end
where the shadowing functions handle a single version tag to the serializations
of outermost type instances. The constituent types need not have the annotation:
the type t will be identical to the type t in the containing Vn module.
The existing versioning system generates a function in Stable modules:
val bin_read_to_latest_opt : Bin_prot.Common.buf -> pos_ref:(int ref) -> Stable.Latest.t option
That function deserializes data by dispatching on version tags. Serializations may or may not have version tags. If there are neither all-tagged or top-tagged serializations, generate nothing. If there are all-tagged serializations, generate:
val bin_read_all_tagged_to_latest : Bin_prot.Common.buf -> pos_ref:(int ref) -> Stable.Latest.t Or_error.t
This function will know only about the version in the legacy Vn modules where
we've maintained all-tagging.
If there are top-tagged serializations, generate:
val bin_read_top_tagged_to_latest : Bin_prot.Common.buf -> pos_ref:(int ref) -> Stable.Latest.t Or_error.t
This function will know about all the versions within a Stable module.
Because the default serialization does not contain version tags, it no longer
makes sense to generate such a function in Stable. Instead, generate it inside
With_all_version_tags and With_top_version_tag, if they're created. The
return type should be changed to an Or_error.t type, to describe errors,
instead of using an option type.
Version linting changes
The CI version linter looks for changes to versioned types in a PR branch by comparing those types with counterparts in the base branch. The linter will complain if a new version is added, and later modified, even if there are no deployed nodes using the new version.
Instead, the linter should use both the PR base branch and a release branch for comparison, according to this table:
Base diff Release diff Significance Warn?
N N No version changes N Y Y Version changed Y N Y Previous force merge N Y N Modified new version N
Under this regime, a new version can be added and modified as needed, without triggering a linter warning. If a PR contains an innocuous change to a versioned type, such as a field renaming, that can be accomplished with a force-merge, without triggering warnings in later PRs.
It should be enough to change the script ci_diff_types.sh to compare the PR
branch against the release branch, in addition to the comparison made with
$BASE_BRANCH_NAME. There may need to be a mechanism to automate that change
when releases become available, which is beyond the scope of this RFC.
The version linter had been using the text of type definitions for comparisons,
and the "Binable" linter had been using the text of the uses of the Binable
functors. We can instead use Bin_prot shape digests for comparisons in both
cases. For a type t, the digest is available as a string:
Bin_prot.Shape.eval_to_digest_string bin_shape_t
With this approach to comparisons, we can use a single linter task in CI, rather than separate version and Binable linters.
Using shape digests to detect changes to serializations also means we no longer
need unit tests for version-asserted types. ppx_version was checking for a
For_tests module in that case, to remind that such tests were needed; that
check can be removed.
Shape digests will also allow detecting changes to serializations provided by
hand-written bin_io code. We may wish to add a module annotation like
%versioned_custom for that use case. If the bin_shape_t for a type with a
custom serialization is built with Bin_prot.Shape.basetype, the shape should
use a new UUID if the serialization changes (effectively, a version), for change
detection to be effective.
Drawbacks
The existing versioning mechanism is working acceptably well. The only drawback
of making changes to the version-tagging is the effort to update ppx_version.
That should not be a major effort, since it's mostly reorganizing existing code.
Rationale and alternatives
If these changes are not made, network performance will be less than it could be.
We could omit the top-tagging mechanism, since there isn't a specific use case for it now. Adding it later is possible, if it's needed for some purpose. Then again, adding that feature now while working on other code changes may be easier than deferring it.
We could phase out the all-tagged mechanism, by allowing two syntaxes for public keys and other affected types, and eventually disallowing the old syntax. That strategy may be too drastic to impose upon users.
Prior art
The relevant prior art is the existing versioning mechanism.
Unresolved questions
- Do we need top-tagging?
- What types will require all-tagging? Any user-visible Base58Check-encoded data
created from
Bin_prot-serializations will need that. One way to enforce this would be to haveCodable.Make_base58_checktake a module containing a valueall_tagged, which would be contained in the generatedWith_all_version_tagsmodules.