kivikakk.ee

Back before I had a clue what I was doing with OCaml, I read an article on Jane Street Capital’s OCaml blog about using phantom types for static access control—read: totally awesome. I stumbled across a situation that lead me to wonder if I could apply part of that knowledge here.

The short answer is: you can. What am I using it for? Storing the saved status of a record (namely, whether or not it has an id/given primary key associated with it) as a phantom type. I also use the type used for attaching the phantom types to store the primary key itself, which has an associated overhead (and makes the attachee type decidedly not phantom).

The relevant code follows. In my case, the primary keys are of type int32, so they’re what’s buried in the type.

(* pK.ml *)
type none
type some

type ('con,'a) t = int32 option * 'a

let none t = (None,t)
let some k t = (Some k,t)
let get = snd

let pk t = Option.get (fst t)
(* equiv to: match fst t with | Some n -> n | None -> raise Option.No_value *)

let pk_opt = fst

So far, this looks like a pretty standard way to attach a given piece of data to any other piece of data, except for those strange, phantom (not abstract, because there’s no underlying definition) types at the top which never seem to be used, and the similarly unused 'con type variable on t.

The magic happens, of course, in the .mli:

(* pK.mli *)
type none
type some

type ('con,'a) t

val none : 'a -> (none,'a) t
val some : int32 -> 'a -> (some,'a) t
val get : ('con,'a) t -> 'a
val pk : (some,'a) t -> int32
val pk_opt : ('con,'a) t -> int32 option

So, as we recall, the definition of none is let none t = (None,t). The right-hand side matches the definition of t faithfully, as it’s a int32 option * 'a, but we’ve also filled in the 'con type variable, by saying its type is none.

At compile-time, this gets optimised away, because there is no value for the none type. But during compile-time, the type is known.

some is defined as let some k t = (Some k,t), so we know the value k is being stored safely in the int32 option. But the 'con type variable is also specified as some.

Let’s skip ahead and look at pk. It takes a (some,'a) t and yields an int32. That means that, for example, PK.pk (PK.none 42) doesn’t typecheck:

Error: This expression has type (PK.none, int) PK.t
       but an expression was expected of type (PK.some, 'a) PK.t

If we omitted the .mli file, we’d get an exception at run-time instead.

It’s important to remember that this technique has nothing to do with the actual bundling of data and subsequent hiding—that’s a perfectly normal thing to do. The benefit here is in making certain assertions about whether or not that data is actually present; we can be sure that PK.pk will never throw an exception, because the object passed to it could only have been created by PK.some.

Note that we do experience some some additional runtime slowness due to the use of an int32 option (and subsequently Option.get or matching on that to retrieve its value). If you’re happy to sacrifice PK.pk_opt, you can do something like this instead:

(* pK.ml *)
type none
type some

type ('con,'a) t = int32 * 'a

let none t = (0l,t)
let some k t = (k,t)
let get = snd
let pk = fst

The .mli is the same, except with pk_opt removed.

This is much more concise, as the boolean status of “is the primary key present?” is not stored at all at runtime. The upshot is that you also have no way to tell at runtime. This isn’t a “problem” in and of itself, because the type-checking means you’ll never accidentally see the 0l value stored with the none type (we just need to put something there, without an option type)—but it means that there’s no way to do anything at runtime that differentiates between the two.

If you don’t need to, then this is great, too, because it’s exactly how you can achieve the static access control, as described in Yaron Minsky’s post referred above—you can also attach additional data if you want to drag information along with the typing data!

So, how am I using this? Here’s an example interface for a record type which has an associated database table:

(* world.mli *)
type base_t = { name: string;
                width: int; height: int;
                defaultTileId: int32;
                placements: placement_t array }
           
type 'con t = ('con,base_t) PK.t

val empty : PK.none t

val of_id : int32 -> PK.some t
val save : 'con t -> PK.some t

We can ignore the details of this—the base type, 'con t, will either have PK.none or PK.some filled in for 'con.

As an example, there’s a default “empty” record, empty. In the .ml, it’s defined like so:

let empty = PK.none { name="";
                      width=0; height=0;
                      defaultTileId=0l;
                      placements=[||] }

This means that we can’t accidentally treat this as a record that actually has a corresponding row in the database. This has already caught a bug in my application—as soon as I implemented this, the type-checker alerted me where I was allowing users to belong in unsaved worlds, which would cause a problem as soon as I tried to save the user out to the database (and serialise the database ID as zero).

There’s also a certain elegance in the type statement val save : 'con t -> PK.some t, in my opinion.