Original source: 0004-style-guidelines.md

Summary

Commit to a semi-formal, extended ocaml style guidelines, with a focus on defining how scalably use modules in our system.

Motivation

We have been informally following ocamlformat + the janestreet style guidelines so far. This has been working fairly well, however, there is currently no definition around module/functor naming and usage standards. By defining that, we hope to be able to refactor our current module/functor structure to be more consistent, and hopefully eliminate some of the current headache in the codebase.

Detailed design

See docs/style_guidelines.md for the style guidelines draft.

Rationale and alternatives

Most of the new styleguide follows inline with what we have been doing informally as a team. However, there are a couple of new points which represent changes in our codebase moving forward.

The first is the "no monkeypatching" rule. The rationale for this is explained already in the style guidelines, so I will not reiterate that here.

The second is the new functor patterns. Functors which abstract over types need to have some way to reconcile the equality of those types at different layers of the codebase. Specifically, if you have two separate functors which return modules that operate off of the same values, these values are incompatible unless you make a statement about their equality. Even if you are aware of the implementation being passed into each functor directly, the type system requires hints. The tool for doing this is the with syntax, but this syntax is very flexible, and it can be difficult to decide which syntax to use in which situation. Let's start by reviewing the ways that with can be used.

Each with declaration has a two dimensions of decisions, each of which is a boolean domain. In total, therefore, there are 4 combinations which can be used in a with statement. The first dimension is what the with will state an equality on. A with declaration can state an equality between two types, or two modules. The second dimension is how that equality should effect the newly composed signature. The equality can either state that the new signature should still contain the module/value, or it can state that the new signature should strip the equality and, instead, replace all instances of the module/value in the signature with the equality.

For clarity, take a look at this example, which contains all 4 possible combinations of a with statement.

module Type_equality = struct
  module type S = sig
    type a
    type b
    val f : a -> unit
    val g : b -> unit
  end

  module Inline (X : X.S) : S with type a = X.a and type b = X.b
  (* return signature becomes:
   * sig
   *   type a = X.a
   *   type b = X.b
   *   val f : a -> unit
   *   val g : b -> unit
   * end
   *)

  module Replace (X : X.S) : S with type x := X.a and type b := X.b
  (* return signature becomes:
   * sig
   *   val f : X.a -> unit
   *   val g : X.b -> unit
   * end
   *)
end

module Module_equality = struct
  module type S = sig
    module X : X.S
    val f : X.t -> unit
  end

  module Inline (X : X.S) : S with module X = X
  (* return signature becomes:
   * sig
   *   module X = X
   *   val f : X.a -> unit
   *   val g : X.b -> unit
   * end
   *)

  module Replace (X : X.S) : S with module X := X
  (* return signature becomes:
   * sig
   *   val f : X.a -> unit
   *   val g : X.b -> unit
   * end
   *)
end

Each of the options on these dimensions have their own tradeoffs to consider. Let's begin by comparing type in place equality and replacement equality. In place equality = has the advantage that any equalities are still observable from the newly generated signature. This can be really useful when you are composing many modules together with many common dependencies. If replacement equality is used, equality statements about a set of modules with common dependencies must be duplicated in all functors which reason about those module signatures. If the equalities are left in place, then the common dependencies can be described as equalities in all locations without needing to pass around any of the common dependencies. To state this more formally, take an example where there is a module X : X_intf, and two functors A (X : X_intf) : A_intf with module X = X and B (X : X_intf) : B_intf and module X = X. If there is a 3rd functor which operates off of modules generated by A and B from X, we can define it with an interface that maintains compatiblity between A.X and B.X by declaring it as C (A : A_intf) (B : B_intf with module X = A.X) : C_intf with module X = A.X. If, instead, we had chosen to define the with declarations on functors A and B with :=, the functor C would require a definition like C (X : X_intf) (A : A_intf with module X := X) (B : B_intf with module X := X) : C_intf with module X := X. As we continue to add more common dependencies between A, B, and C, we will have to keep on adding functor arguments and with declarations to C.

Replacement equality := has the advantage that the generated signature is narrowed in scope. This can be useful whenever you want to compose a smaller signature from a larger one. In our codebase, however, we only ever need to do this when we are calling include on a module which's signature already defines a module we have in scope of the current module structure. Outside of this, = appears to be superior, especially if you are always going to talk about a module signature at a level that allows you to still wire common dependencies together via equalities (which is how we consistently are using modules and signatures throghout our codebase).

Now, let's take a look at type equalities vs module equalities. Type equalities provide a finer grain of atomicity. Module equalities typically require less equality statements, but at the cost that the grain of atomicity is defined on a per signature basis. Let's think about an example to help make this clearer. Let's say we have 4 modules: A, B, C, and D. B depends on A, and D depends on all three modules A, B, and C. Using type equality, we would need to say that a functor creating D returns D_intf with type a = B.a and type b = B.t and type c = C.t (assuming B uses the same pattern for A as D does for A, B, and C). However, if we instead use module equalities, the returning interface of the D functor could be expressed as D_intf with module B = B and module C = C. We do not need to express an equality on D_intf.A since D_intf.B contains D_intf.B.A and D_intf.B = B, therefor D_intf.B.A = B.A. This scales much better as more and more dependencies are nested. With type equality, every layer we go up in dependencies requires more and more equality declarations.

As mentioned earlier, the disadvantage of using module equalities is that the grain of atomicity for a dependency becomes tied to the decomposition of its signatures. Right now, in our codebase, we don't do much signature decomposition. However, independent of this issue, we should move towards decomposing signatures anyway. In fact, it's part of the janestreet styleguide. Signatures should almost never be written entirely by hand, but instead should compose smaller, finer grained signatures in order to build up the majority of what it requires. Using this pattern, we can still limit the surfaced requirements of dependencies while gaining the advantages of module equality. When viewed from this perspective, in fact, tying the grain of atomicity to the signature decomposition of a dependency actually becomes a pro instead of a con, since we as developers get more control over the level of abstraction our signatures should have over the underlying structure.