Skip to main content

RFC 0017: Module Versioning

Original source: 0017-module-versioning.md

Summary

We describe ways to version modules containing types with serialized representations.

Motivation

Within the Coda codebase, modules contain (ideally) one main type. As the software evolves, that type can change. When compiling, the OCaml type discipline enforces consistent use of such types. Sometimes we'd like to export representations of those types, perhaps by persisting them, or by sending them to other software. Such other software may be written in languages other than OCaml.

Detailed design

There are three main serialized representations of data used in the Coda codebase: bin_io, sexp, and yojson. The readers and writers of those representations are typically created using the derive annotation on types.

Explicit versioning

The bin_io representation and associated bin_prot library are claimed to be type-safe when used in OCaml (see Jane Street bin_prot. Moreover, other programming languages don't appear to be able to produce or consume this representation. Therefore, type and versioning information need not be stated explicitly in this representation. Nonetheless, it is still important to use versioning, so that we may convert older representations to the current module version's type.

Many programming languages can produce and consume the sexp and yojson representations. Given an OCaml type, the derived writers for those representation do not mention the type or a version. So non-OCaml consumers of these representations cannot easily detect the type distinctions in OCaml that gave rise to the representations.

When a new version of a type is created, it can be worthwhile to retain the older version and associated code. That allows producing and consuming representations for older versions of the software. We can also take older representations to produce a value of the type associated with the latest version of a module.

Here is a proposed discipline for creating a module with a name and a version:

module Some_data = struct

module Stable = struct

module V1 = struct
module T = struct
let version = 1

type t = ... [@@deriving bin_io,...]
end

include T

type latest = t

let to_latest t = t

include Make_version (T)
end

module Latest = V1

(* declare module *)

module Module_decl = struct
let name = "some_data"
type latest = Latest.t
end

(* register versions *)

module Registrar = Make (Module_decl)
module Registered_V1 = Registrar.Register (V1)
end

The Make_version functor generates boilerplate code for bin_io that shadows the code generated by deriving. The shadowing code adds a version number in serializations. The Registrar module maintains a set of registered modules, indexed by version numbers. A Registrar can take a serialization containing a version number and deserialize it to an instance of the type for that version number. By applying the to_latest for that version number, we get a value of the type for the latest module version:

Example:

  let buf = ... in
ignore (Registered_Vn.bin_write_t ~pos:0 buf some_Vn_value);
match Registrar.deserialize_binary_opt buf with
| None -> failwith "shouldn't happen"
| Some _ -> printf "got latest version of some_Vn_value"

A new module version V2 consists of a new version number, a new type, which is also the type latest, and the function to_latest.

So we'd have:

module Some_data = struct

module Stable = struct

module V2 = struct

module T = struct
let version = 2

type t = ... [@@deriving bin_io,...]
end

include T

include Make_latest_version (T)

end

module Latest = V2 (* changed, was V1 *)

module V1 = struct

module T = struct
let version = 1
type t = ... [@@deriving bin_io,...]
end

include T

type latest = Latest.t (* changed, was t *)

let to_latest = ... (* changed, was the identity *)

include Make_version (T)
end

module Module_decl = struct
let name = "some_data"
type latest = Latest.t
end

module Registrar = Make (Module_decl)
module Registered_V1 = Registrar.Register (V1)
module Registered_V2 = Registrar.Register (V2) (* new *)
module Registered_Latest = Registered_V2 (* changed, was Registered_V1 *)

The Make_latest_version functor is like Make_version, except that it gives definitions for latest (it's t) and to_latest (the identity function on t).

Clients of a versioned module should always use the Stable.Latest version.

Serialization restricted to Stable, versioned modules

Serialization should be allowed only for modules following this discipline, including the use of the Stable submodule containing versioned modules.

An exception can be carved out for using sexp serialization of data from modules without Stable and versioning. That kind of serialization is useful for logging and other printing from within Coda, and is less likely to be used with other software.

Coordination with RPC versioning

RFC 0012 indicates how to version types used with the Jane Street RPC mechanism. The types specified in versioned modules here can also be used in the query, response, and message types used for RPC. In that case, if a new version of a module is created, the RPC version should be updated at the same time.

Embracing this discipline; static enforcement

As of this writing, there are over 600 type definitions with deriving bin_io in the Coda codebase, even more with sexp. Only about 60 type definitions are versioned using the Stable.Vn naming discipline (and without the explicit version information suggested here). To embrace this discipline fully will take some effort.

We can make awareness of the existing recommendation, or the extended discipline suggested here a checklist item for Github pull requests.

In some critical situations, such as when using versioned RPC (RFC 0012), we'd like static assurance that types are versioned. By writing a suitable ppx, we can add an annotation versioned to type deriving lists. That ppx will generate a definition, perhaps called __versioned, if all types contributing to a type also have the annotation. It can also check that the type is named t, and occurs in the module hierarchy Stable.Vn.T (or in the module hierarchy for versioned RPC types). The functors Make_version and Make_latest_version can also require their argument to have the annotation, to localize the error when the annotation is omitted.

Using versioned types in other versioned types

When mentioning a versioned type in the definition of another versioned type, such as in a record, a specific version of the included type must be used. That is, do not use version Latest, which may refer to different modules over time. With this restriction in place, the serialization of the including type has a fixed format. Otherwise, multiple, incompatible serializations of the including type could be generated.

Type parameters

The mechanism described here does not handle the case where a module type has parameters. Serialization and deserialization can only be done when the parameters have been instantiated. The solution is to define a type where the parameters are known.

For example, if type my_type takes a type parameter:

(* can use Make_version on T *)
module T = struct
type t = string my_type
end

Drawbacks

We may not need the versioning for sexp and yojson, if these representations are not, in fact, used by other software.

Rationale and alternatives

The main choice is between versioning and not versioning. If we don't use versioning, given a representation, it becomes unclear what type it's associated with.

We could extend the discipline here to string representations, though those are not automatically derivable.

Prior art

The file docs/style-guide.md mentions versioning of stable module types.

PR #1645, already merged, partially implements a module registration mechanism. That implementation deals only with the bin_io representation, not yojson or sexp.

PR #1653, already merged, more completely implements a module registration mechanism like the one described here.

PR #1633 added a checklist item for versioned modules to the PR template.