Original source: 0004-style-guidelines.md
Summary
Commit to a semi-formal, extended ocaml style guidelines, with a focus on defining how scalably use modules in our system.
Motivation
We have been informally following ocamlformat + the janestreet style guidelines so far. This has been working fairly well, however, there is currently no definition around module/functor naming and usage standards. By defining that, we hope to be able to refactor our current module/functor structure to be more consistent, and hopefully eliminate some of the current headache in the codebase.
Detailed design
See docs/style_guidelines.md for the style guidelines draft.
Rationale and alternatives
Most of the new styleguide follows inline with what we have been doing informally as a team. However, there are a couple of new points which represent changes in our codebase moving forward.
The first is the "no monkeypatching" rule. The rationale for this is explained already in the style guidelines, so I will not reiterate that here.
The second is the new functor patterns. Functors which abstract over types need
to have some way to reconcile the equality of those types at different layers of
the codebase. Specifically, if you have two separate functors which return
modules that operate off of the same values, these values are incompatible
unless you make a statement about their equality. Even if you are aware of the
implementation being passed into each functor directly, the type system requires
hints. The tool for doing this is the with syntax, but this syntax is very
flexible, and it can be difficult to decide which syntax to use in which
situation. Let's start by reviewing the ways that with can be used.
Each with declaration has a two dimensions of decisions, each of which is a
boolean domain. In total, therefore, there are 4 combinations which can be used
in a with statement. The first dimension is what the with will state an
equality on. A with declaration can state an equality between two types, or
two modules. The second dimension is how that equality should effect the newly
composed signature. The equality can either state that the new signature should
still contain the module/value, or it can state that the new signature should
strip the equality and, instead, replace all instances of the module/value in
the signature with the equality.
For clarity, take a look at this example, which contains all 4 possible
combinations of a with statement.
module Type_equality = struct
module type S = sig
type a
type b
val f : a -> unit
val g : b -> unit
end
module Inline (X : X.S) : S with type a = X.a and type b = X.b
(* return signature becomes:
* sig
* type a = X.a
* type b = X.b
* val f : a -> unit
* val g : b -> unit
* end
*)
module Replace (X : X.S) : S with type x := X.a and type b := X.b
(* return signature becomes:
* sig
* val f : X.a -> unit
* val g : X.b -> unit
* end
*)
end
module Module_equality = struct
module type S = sig
module X : X.S
val f : X.t -> unit
end
module Inline (X : X.S) : S with module X = X
(* return signature becomes:
* sig
* module X = X
* val f : X.a -> unit
* val g : X.b -> unit
* end
*)
module Replace (X : X.S) : S with module X := X
(* return signature becomes:
* sig
* val f : X.a -> unit
* val g : X.b -> unit
* end
*)
end
Each of the options on these dimensions have their own tradeoffs to consider.
Let's begin by comparing type in place equality and replacement equality. In
place equality = has the advantage that any equalities are still observable
from the newly generated signature. This can be really useful when you are
composing many modules together with many common dependencies. If replacement
equality is used, equality statements about a set of modules with common
dependencies must be duplicated in all functors which reason about those module
signatures. If the equalities are left in place, then the common dependencies
can be described as equalities in all locations without needing to pass around
any of the common dependencies. To state this more formally, take an example
where there is a module X : X_intf, and two functors
A (X : X_intf) : A_intf with module X = X and
B (X : X_intf) : B_intf and module X = X. If there is a 3rd functor which
operates off of modules generated by A and B from X, we can define it with
an interface that maintains compatiblity between A.X and B.X by declaring it
as
C (A : A_intf) (B : B_intf with module X = A.X) : C_intf with module X = A.X.
If, instead, we had chosen to define the with declarations on functors A and
B with :=, the functor C would require a definition like
C (X : X_intf) (A : A_intf with module X := X) (B : B_intf with module X := X) : C_intf with module X := X.
As we continue to add more common dependencies between A, B, and C, we
will have to keep on adding functor arguments and with declarations to C.
Replacement equality := has the advantage that the generated signature is
narrowed in scope. This can be useful whenever you want to compose a smaller
signature from a larger one. In our codebase, however, we only ever need to do
this when we are calling include on a module which's signature already defines
a module we have in scope of the current module structure. Outside of this, =
appears to be superior, especially if you are always going to talk about a
module signature at a level that allows you to still wire common dependencies
together via equalities (which is how we consistently are using modules and
signatures throghout our codebase).
Now, let's take a look at type equalities vs module equalities. Type equalities
provide a finer grain of atomicity. Module equalities typically require less
equality statements, but at the cost that the grain of atomicity is defined on a
per signature basis. Let's think about an example to help make this clearer.
Let's say we have 4 modules: A, B, C, and D. B depends on A, and D
depends on all three modules A, B, and C. Using type equality, we would
need to say that a functor creating D returns
D_intf with type a = B.a and type b = B.t and type c = C.t (assuming B uses
the same pattern for A as D does for A, B, and C). However, if we
instead use module equalities, the returning interface of the D functor could
be expressed as D_intf with module B = B and module C = C. We do not need to
express an equality on D_intf.A since D_intf.B contains D_intf.B.A and
D_intf.B = B, therefor D_intf.B.A = B.A. This scales much better as more and
more dependencies are nested. With type equality, every layer we go up in
dependencies requires more and more equality declarations.
As mentioned earlier, the disadvantage of using module equalities is that the grain of atomicity for a dependency becomes tied to the decomposition of its signatures. Right now, in our codebase, we don't do much signature decomposition. However, independent of this issue, we should move towards decomposing signatures anyway. In fact, it's part of the janestreet styleguide. Signatures should almost never be written entirely by hand, but instead should compose smaller, finer grained signatures in order to build up the majority of what it requires. Using this pattern, we can still limit the surfaced requirements of dependencies while gaining the advantages of module equality. When viewed from this perspective, in fact, tying the grain of atomicity to the signature decomposition of a dependency actually becomes a pro instead of a con, since we as developers get more control over the level of abstraction our signatures should have over the underlying structure.