Initially present only in functional languages such as OCaml and Haskell, Algebraic Data Types (ADTs) have now become pervasive in mainstream languages, providing nice data abstractions and an elegant way to express functions though pattern-matching. Unfortunately, ADTs remain seldom used in low-level programming. One of the reason is that their increased convenience come by abstracting away the exact memory layout of values. Even Rust, which tries to optimize data-layout, severely limits control over the representation.
In this article, we present a new approach to specify the data-layout of rich data types based on a dual view: a source type, providing the high-level description available in the rest of the code, along with a memory type, providing full control over the memory layout. This dual view allow to better reason about the memory layout, both for correctness, with a dedicated validity criteria linking the two views, and for optimizations that manipulate the memory view. We then provide algorithms to compile constructors and destructors, including pattern matching, to their low-level memory representation. We prove our compilation algorithms correct, implement them in a tool called ribbit that compiles to LLVM, and show some early experimental results.
This artifact contains our ribbit compiler, implemented in OCaml and using LLVM as a backend.