Pattern matching costs

15 October 2010

Actually, this post’s title is a misnomer. Pattern matching doesn’t cost, as such. Allocating costs.

In a series of blog posts, Richard WM Jones talks about OCaml internals, relating to how it manages memory. He says:

The OCaml compiler (ocamlopt) on the other hand compiles your code very literally. For example if you write:
let f (a,b) = (a,b)
then you are asking the compiler to pattern match (a,b) and construct (ie. allocate) another tuple (a, b), and that’s exactly what the compiler will do.

These sorts of “hidden” allocations are a common problem when people are trying to wring the last drop of performance from their OCaml inner loops. Even 5 instructions in an inner loop can be costly, particularly when it’s unnecessary as in the function above.

You are better off writing this function like so:
let f (x : ('a*'b)) = x

I decided to test this assertion. Two test cases:

(** atest.ml **)

let f (a,b) = (a,b);;

for i = 1 to 1000000000 do
    ignore (f (10,20))
done;;

(** btest.ml **)

let f (x : ('a * 'b)) = x;;

for i = 1 to 1000000000 do
    ignore (f (10,20))
done;;

I tested each type twice in a row (to account for potential cache issues):

celtic@azayaka:~$ time ocaml atest.ml; time ocaml atest.ml; time ocaml btest.ml; time ocaml btest.ml

real    0m31.846s
user    0m31.830s
sys     0m0.000s

real    0m31.840s
user    0m31.830s
sys     0m0.000s

real    0m23.753s
user    0m23.740s
sys     0m0.000s

real    0m23.750s
user    0m23.730s
sys     0m0.010s
celtic@azayaka:~$

And sure enough, btest.ml wins. It’s important to note that the difference was just over 8 nanoseconds per call for this example, so it’s probably not important unless it’s used in many places or in a tight loop, and is allocating things far more weighty than an int * int tuple.

Of course, if the type actually didn’t matter in the end (such as in our example), go ahead and make it more general, and let the compiler use a polymorphic type instead!

(** ctest.ml **)

let f x = x;;

for i = 1 to 1000000000 do
    ignore (f (10,20))
done;;

real    0m23.791s
user    0m23.730s
sys     0m0.030s

real    0m23.769s
user    0m23.750s
sys     0m0.000s

There’s a more interesting question from here: what about if we still match a tuple pattern, but don’t actually allocate a new tuple for our return value? There are two possible ways to do this:

(** dtest.ml **)

let f ((a,b) as x) = x;;

for i = 1 to 1000000000 do
    ignore (f (10,20))
done;;

(** etest.ml **)

let f (a,b) = a;;

for i = 1 to 1000000000 do
    ignore (f (10,20))
done;;

The results here are interesting. dtest.ml is of the same performance class as the previous fastest solutions:

real    0m23.811s
user    0m23.740s
sys     0m0.020s

real    0m23.754s
user    0m23.740s
sys     0m0.000s

I assume this means the compiler is asserting the type at compile-time (i.e. f‘s argument must be able to match (a,b), but as we do nothing with the matched definition, there is no cost).

Meanwhile, etest.ml takes more than a second longer (!).

real    0m25.099s
user    0m25.090s
sys     0m0.000s

real    0m25.099s
user    0m25.060s
sys     0m0.030s

I’m guessing this is because it’s actually having to allocate a new int to return, as it can’t return part of the (immutable) tuple. Or something.

Any comments? I’m no OCaml professor, nor indeed an academic in the field, so any comments on my tests and/or their reliability or impact would be most welcome.