RFC 0017: Module Versioning
Original source: 0017-module-versioning.md
Summary
We describe ways to version modules containing types with serialized representations.
Motivation
Within the Coda codebase, modules contain (ideally) one main type. As the software evolves, that type can change. When compiling, the OCaml type discipline enforces consistent use of such types. Sometimes we'd like to export representations of those types, perhaps by persisting them, or by sending them to other software. Such other software may be written in languages other than OCaml.
Detailed design
There are three main serialized representations of data used in the Coda
codebase: bin_io, sexp, and yojson. The readers and writers of those
representations are typically created using the derive annotation on types.
Explicit versioning
The bin_io representation and associated bin_prot library are claimed to be
type-safe when used in OCaml (see
Jane Street bin_prot. Moreover, other
programming languages don't appear to be able to produce or consume this
representation. Therefore, type and versioning information need not be stated
explicitly in this representation. Nonetheless, it is still important to use
versioning, so that we may convert older representations to the current module
version's type.
Many programming languages can produce and consume the sexp and yojson
representations. Given an OCaml type, the derived writers for those
representation do not mention the type or a version. So non-OCaml consumers of
these representations cannot easily detect the type distinctions in OCaml that
gave rise to the representations.
When a new version of a type is created, it can be worthwhile to retain the older version and associated code. That allows producing and consuming representations for older versions of the software. We can also take older representations to produce a value of the type associated with the latest version of a module.
Here is a proposed discipline for creating a module with a name and a version:
module Some_data = struct
module Stable = struct
module V1 = struct
module T = struct
let version = 1
type t = ... [@@deriving bin_io,...]
end
include T
type latest = t
let to_latest t = t
include Make_version (T)
end
module Latest = V1
(* declare module *)
module Module_decl = struct
let name = "some_data"
type latest = Latest.t
end
(* register versions *)
module Registrar = Make (Module_decl)
module Registered_V1 = Registrar.Register (V1)
end
The Make_version functor generates boilerplate code for bin_io that shadows
the code generated by deriving. The shadowing code adds a version number in
serializations. The Registrar module maintains a set of registered modules,
indexed by version numbers. A Registrar can take a serialization containing a
version number and deserialize it to an instance of the type for that version
number. By applying the to_latest for that version number, we get a value of
the type for the latest module version:
Example:
let buf = ... in
ignore (Registered_Vn.bin_write_t ~pos:0 buf some_Vn_value);
match Registrar.deserialize_binary_opt buf with
| None -> failwith "shouldn't happen"
| Some _ -> printf "got latest version of some_Vn_value"
A new module version V2 consists of a new version number, a new type, which is
also the type latest, and the function to_latest.
So we'd have:
module Some_data = struct
module Stable = struct
module V2 = struct
module T = struct
let version = 2
type t = ... [@@deriving bin_io,...]
end
include T
include Make_latest_version (T)
end
module Latest = V2 (* changed, was V1 *)
module V1 = struct
module T = struct
let version = 1
type t = ... [@@deriving bin_io,...]
end
include T
type latest = Latest.t (* changed, was t *)
let to_latest = ... (* changed, was the identity *)
include Make_version (T)
end
module Module_decl = struct
let name = "some_data"
type latest = Latest.t
end
module Registrar = Make (Module_decl)
module Registered_V1 = Registrar.Register (V1)
module Registered_V2 = Registrar.Register (V2) (* new *)
module Registered_Latest = Registered_V2 (* changed, was Registered_V1 *)
The Make_latest_version functor is like Make_version, except that it gives
definitions for latest (it's t) and to_latest (the identity function on
t).
Clients of a versioned module should always use the Stable.Latest version.
Serialization restricted to Stable, versioned modules
Serialization should be allowed only for modules following this discipline,
including the use of the Stable submodule containing versioned modules.
An exception can be carved out for using sexp serialization of data from
modules without Stable and versioning. That kind of serialization is useful
for logging and other printing from within Coda, and is less likely to be used
with other software.
Coordination with RPC versioning
RFC 0012 indicates how to version types used with the Jane Street RPC mechanism. The types specified in versioned modules here can also be used in the query, response, and message types used for RPC. In that case, if a new version of a module is created, the RPC version should be updated at the same time.
Embracing this discipline; static enforcement
As of this writing, there are over 600 type definitions with deriving bin_io
in the Coda codebase, even more with sexp. Only about 60 type definitions are
versioned using the Stable.Vn naming discipline (and without the explicit
version information suggested here). To embrace this discipline fully will take
some effort.
We can make awareness of the existing recommendation, or the extended discipline suggested here a checklist item for Github pull requests.
In some critical situations, such as when using versioned RPC (RFC 0012), we'd
like static assurance that types are versioned. By writing a suitable ppx, we
can add an annotation versioned to type deriving lists. That ppx will
generate a definition, perhaps called __versioned, if all types contributing
to a type also have the annotation. It can also check that the type is named
t, and occurs in the module hierarchy Stable.Vn.T (or in the module
hierarchy for versioned RPC types). The functors Make_version and
Make_latest_version can also require their argument to have the annotation, to
localize the error when the annotation is omitted.
Using versioned types in other versioned types
When mentioning a versioned type in the definition of another versioned type,
such as in a record, a specific version of the included type must be used. That
is, do not use version Latest, which may refer to different modules over time.
With this restriction in place, the serialization of the including type has a
fixed format. Otherwise, multiple, incompatible serializations of the including
type could be generated.
Type parameters
The mechanism described here does not handle the case where a module type has parameters. Serialization and deserialization can only be done when the parameters have been instantiated. The solution is to define a type where the parameters are known.
For example, if type my_type takes a type parameter:
(* can use Make_version on T *)
module T = struct
type t = string my_type
end
Drawbacks
We may not need the versioning for sexp and yojson, if these representations
are not, in fact, used by other software.
Rationale and alternatives
The main choice is between versioning and not versioning. If we don't use versioning, given a representation, it becomes unclear what type it's associated with.
We could extend the discipline here to string representations, though those are not automatically derivable.
Prior art
The file docs/style-guide.md mentions versioning of stable module types.
PR #1645, already merged, partially implements a module registration mechanism.
That implementation deals only with the bin_io representation, not yojson or
sexp.
PR #1653, already merged, more completely implements a module registration mechanism like the one described here.
PR #1633 added a checklist item for versioned modules to the PR template.