Kindness for Mean Girls: a discussion on type-level programming in Haskell

A primer on the cruel, tacit laws of type-level programming in Haskell

Haskell programs are constructed at the junction of two worlds: one of types, and the other of values. Values are operational run-time entities that are denoted syntactically by terms in the code. To this end, types classify terms. They provide a powerful abstraction to organize data, determine how it should be stored in memory, passed through various operations, and more. On a superficial level, the difference between these worlds is simple: types refer to datatypes, which are either built-in (such as Integer and String), or user-declared, for example:

data SpringFlingQueen = Cady | Regina

By comparison, values are denoted by terms those types categorize (such as 1 and "abc" described by the standard library’s Integer and String types, or Cady and Regina in the case of our user-declared type, SpringFlingQueen). Thus, the term 1 denotes a value that exists in memory only when the program is run.

In addition to these two worlds, a third, darker, and more elusive underworld of kinds also lurks in the shadows of Haskell programs. Haskell kinds can be as unpredictable and insidious as the soulless alpha-female of an elite high school clique. While daunting, understanding the coaction between these three worlds lays the foundation for type-level programming, a topic worth learning as it strengthens overall Haskell intuition.

the haskell type system: a machine of constant, unforgiving judgement and rigid classification

Kindness is a virtue

Much like counter-intuitive adolescent social dynamics, mastery of Haskell bears a steep learning curve because of its underlying complexity. It is difficult to wrap one’s head around the language’s typeclass hierarchies, category-theoretic roots, or the intricacies of compiler behavior. However, an intimate understanding of the type system lets programmers look beyond the chaos of code and apply a recognizable template by which to understand it.

Kindness takes patience

This topic is also difficult to master because it is both vast and subtle. Ideas underlying type-level programming are scattered across many disconnected resources focused on several corners of the Haskell ecosystem. Authoritative references on the topic (such as relevant library documentation) tend to assume familiarity with domain-specific terminology that may be unknown to non-experts. Due to the paucity of approachable materials on the subject, I’ve attempted to break down type-level programming into explanations of its constituent parts. While I don’t go into extensive depth on any individual section below, I hope to provide a valuable starting point with which readers can explore ideas in greater detail.

Table of contents
Types vs. Values
Kinds
Higher-kinded types
Constraint kinds
Kind polymorphism
DataKinds and type-level literals
Datatype promotion
Relationship between values, terms, types, and kinds
The difference between “type” and “type-level”
Type families
Associated types
GHC.TypeLits
Distinguishing between DataKinds and GHC.TypeLits
Use-case: Marshaling ASTs
The hype of dependently-typed languages

Types vs. Values

As mentioned above, Haskell programs are divided into two worlds: the type-level and the value-level. The type-level refers to the part of a program analyzed by the static type-checking phase during compilation. Since every expression is assigned a type, code is checked against Haskell’s type system to ensure it fits GHC’s specified correctness criteria.

types classify terms, just like hostile teens with identity crises classify one another

If the program is well-typed, the program will compile. However, this type-information evaporates once the compilation process terminates, leaving only values behind at run-time. In this way, types and values can be better distinguished by the phasing distinction, with types being compile-time entities and values being run-time entities.

For example, if a function takes a String, the type-checker doesn’t care if the String has a value denoted by "abc" or "123" or "get in loser"—so long as it’s a String. Type information gets discarded at run-time, leaving only the string’s value (ex.,"get in loser"). In some cases, however, we want the type system to care about what those values are and distinguish between them. The DataKinds extension, which we’ll explore below, lets us do that by allowing us to shove more information about our program into the type system. This lets us add meaning to the value of a String at the type-level, moving them from their usual, strictly run-time existence, into the compile-time phase. The technique allows us to use more information statically in the logical development and abstraction of the program’s behavior. We’re also able to add consequences that can halt compilation if the specific value of String doesn’t conform to the rules we’ve specified, or define classes over them.

Distinguishing types and values in GHCi

Let’s examine our aforementioned SpringFlingQueen datatype in GHCi. When we query the type of one of its constructors using :t (a handy shorthand for :type), we see that it is indeed of type SpringFlingQueen:

λ data SpringFlingQueen = Cady | Regina
λ :t Cady
Cady :: SpringFlingQueen

Now consider literals 1, 2, 3. These are values of a type that parameterizes the Num class:

λ :t 1
Num p => p

Typing :t 1 into GHCi gives us Num p => p, indicating that a literal such as 1 can be of any polymorphic type p as long as the Num type class has instances for that type (for example, Integer and Float both have Num instances and therefore are valid types that p could be instantiated with). This is possible due to type inference.

type inference can be used to deduce concrete types wherever they're obvious. If a type looks like a mouse, it's a mouse—duh!

Kinds

Just like types classify terms, kinds classify types, and therefore are frequently described as “types of types”, or referred to be “one level up.” The “star” syntax (i.e., *) denotes kinds. It is defiantly used in this post despite recent syntactic changes. A prerequisite for understanding the kind system necessitates comprehending the differences between three pairs of ideas:

Data constructors vs. type constructors: data constructors create values, whereas type constructors create types. Type constructors take one or more type arguments and produce a datatype when enough arguments are provided. This means that through currying, a type constructor can be partially applied. For example, the list type constructor [] may take a single type argument (ex. String) to denote the elements of the list (i.e., [String]). [String] is simply syntactic sugar for [] String, where the type [] is applied to String.
Full vs. partial application A partially-applied type, like a partially-applied function, is one that is missing some of its data constructors. For instance, consider the list type []. The kind of [] is * -> *. A fully-applied list has data constructors, such as [Int], and its kind is *.
Inhabited types vs. uninhabited types: An inhabitant of a type is precisely a value of that type. This means that inhabited types refer to the types that contain concrete values, such as the value denoted by the term 1 :: Int. This suggests that the type Int is inhabited by value denoted by 1. By contrast, uninhabited types refer to type constructors to which no values are abstracted. For instance, Void is uninhabited because it has no data constructors, and thus can not be used to construct a valid term. While Void may seem pointless at first, it can be a useful way to represent that a container is empty (ex., [Void], which is inhabited by the term [] and the value this term denotes).

How do these ideas relate to the kind system? Well, all fully-applied runtime values are of kind * and they are inhabited. You can confirm this by typing :k Int and :k String in GHCi. However, this doesn’t work the other way around—just because a type is inhabited, doesn’t necessarily mean it’s fully-applied (ex., [] is not fully-applied but it is inhabited, and its kind signature is * -> *). Conversely, all partially-applied types are uninhabited (since they don’t correspond to a value), but not all uninhabited types are partially applied. Void for instance is not partially-applied, but it is uninhabited.

To get the hang of this idea, consider the following examples:

Type	Inhabited?	Fully-applied?	Kind
`Int`	Yes	Yes	`*`
`String`	Yes	Yes	`*`
`[]`	Yes	No	`* -> *`
`[Int]`	Yes	Yes	`*`
`(,)`	Yes	No	`* -> * -> *`
`Either`	Yes	No	`* -> * -> *`
`Void`	No	N/A	`*`
`Either Void Void`	Yes	Yes	`*`

Kind signatures can be specified manually in GHCi using the XKindSignatures extension. Try extending the above table by investigating the kind signatures of various types.

Higher-kinded types

Just like there are higher-order functions (functions that take other functions as arguments), there are higher-kinded types (types constructors that take other type constructors as arguments). Type constructors such as [] are a first-class type, but also a higher-kinded type, given they take another type constructor to be reified:

λ :k []
[] :: * -> *

Let’s consider Functor:

λ :k Functor
Functor :: (* -> *) -> Constraint

We see that Functor takes a type constructor * -> * and returns a Constraint. Let’s use :info to examine Functor a bit more closely:

λ :info Functor
class Functor (f :: * -> *) where
  fmap :: (a -> b) -> f a -> f b
  (<$) :: a -> f b -> f a
  {-# MINIMAL fmap #-}
  	-- Defined in ‘GHC.Base’
instance Functor (Either a) -- Defined in ‘Data.Either’
instance Functor ((,,,) a b c) -- Defined in ‘Data.Orphans’
instance Functor ((,,) a b) -- Defined in ‘Data.Orphans’
instance Functor [] -- Defined in ‘GHC.Base’
instance Functor Maybe -- Defined in ‘GHC.Base’
instance Functor IO -- Defined in ‘GHC.Base’
instance Functor ((->) r) -- Defined in ‘GHC.Base’
instance Functor ((,) a) -- Defined in ‘GHC.Base’

We see that Functor allows type constructors like Maybe and [] to have Functor instances, but not Int or String. This is because Maybe and [] are * -> *, while Int has kind *. Similarly, Either has * -> * -> *, which is why its instance above is parameterized with a type parameter denoting something of kind * -> *: Either a.

Constraint kinds

We got a glimpse of constraint kinds above in the Functor example. Constraint kinds are another flavor of kinds, denoting typeclass constraints left of the => arrow. This allows the programmer to extend the power of constraints to more types, using the -XConstraintKinds extension.

I'm not a regular kind, I'm a constraint kind

Kind polymorphism

Parametric polymorphism is everywhere in Haskell; it allows us to abstract at the type level. Using type variables such as a that can be of any type instead of concrete types such as Int, we’re able to define more generic data and functions that offer greater reusability across our code. Kind polymorphism works similarly, but at the kind level. Using the PolyKinds extension, we’re able to define functions and types that work over a variety of kinds, as opposed to those whose implementations are tied to specific kind signatures. The classic example of Typeable is widely used to demonstrate why this is useful:

class Typeable (t :: *) where
  typeOf :: t -> TypeRep

class Typeable1 (t :: * -> *) where
  typeOf1 :: t a -> TypeRep

class Typeable2 (t :: * -> * -> *) where
  typeOf2 :: t a b -> TypeRep

The classes above are defined for multiple arities, with each class specifying a kind signature that caters to a specific arity: *, * -> * or * -> * -> *. The type variables a and b are both of kind * by default, which is why the Typeable1 has one type variable, and Typeable2 has two. Accommodating for arguments with varying kinds is cool, but this implementation necessitates having to type out separate definitions over and over again. It also obliges the programmer to consult the provided kind signatures to determine the appropriate type we can use with Typeable vs. Typeable1 vs. Typeable2 to ensure well-kindedness:

instance Typeable Int      where typeOf  _ = TypeRep
instance Typeable1 []      where typeOf1 _ = TypeRep
instance Typeable2 Either  where typeOf2 _ = TypeRep

Basic programming intuition dictates that not only is it unsustainable and tedious to have separate implementations for every kind, but it also increases the surface area of our code, thereby increasing the likelihood of errors. What if we could have one definition for all kinds? The PolyKinds extension lets us do exactly that. This lets us unify the three classes into a single representation:

{-# LANGUAGE PolyKinds #-}

data Proxy t = Proxy

class Typeable (t :: k) where
  typeOf :: Proxy t -> TypeRep

We now have the flexibility to pass any type-level entity to typeOf, not just one that is kind * like Int or * -> * like [] or * -> * -> * like Either. This is because Proxy has kind forall. k -> *. This allows us to instantiate a single Typeable class for a variety of kinds:

instance Typeable Int     where typeOf _ = TypeRep
instance Typeable []      where typeOf _ = TypeRep
instance Typeable Either  where typeOf _ = TypeRep

raise your hand if you've ever been personally victimized by the Haskell type system

DataKinds and type-level literals

The DataKinds language extension lets us reason about data at the type-level by providing type-level literals. Normally, using :k to query the kind of literals such as 1 or "foo" doesn’t work. That’s because these entities are at the value level:

λ :k 1

<interactive>:1:1: error:
    Illegal type: ‘1’ Perhaps you intended to use DataKinds

As the error suggests, to be able to inspect the kind of a value, we need to enable the DataKinds extension. Recall, 1 exists at the value-level. Haskell’s type system doesn’t care about values at compile-time, in that they are run-time entities only. However, with XDataKinds enabled, investigating the kinds of literals (such as 1 or "foo") gives us the following:

λ :set -XDataKinds
λ :k 1
1 :: GHC.Types.Nat

λ :k "foo"
"foo" :: GHC.Types.Symbol

The integer 1 is of kind Nat, and the string "foo" is of kind Symbol. Nat and Symbol are constructors defined by the GHC.TypeLits module (explored below, and are therefore out of scope unless we import it.

Datatype promotion

The DataKinds extension also enables datatype promotion. When we define datatypes, we create custom types. With DataKinds turned on, that type becomes a custom kind, and its data constructors become types, thus getting “promoted” through the ranks of the hierarchy of values -> types -> kinds.

Returning to our beloved SpringFlingQueen example, let’s consider what normally happens when we create a simple type constructor in the REPL:

λ data SpringFlingQueen = Cady | Regina
λ :t Cady
Cady :: SpringFlingQueen
λ :k SpringFlingQueen
SpringFlingQueen :: *
λ :k Cady
<interactive>:1:1: error:
    Not in scope: type constructor or class ‘Cady’
    A data constructor of that name is in scope; did you mean DataKinds?

We can inspect the type of its data constructors, and the kind of its type constructor. However, when we try to query for the kind of Cady, we get an error. However, when we turn on XDataKinds, we see the following:

λ :set -XDataKinds
λ :t Cady
Cady :: SpringFlingQueen
λ :k Cady
<interactive>:1:1: warning: [-Wunticked-promoted-constructors]
    Unticked promoted constructor: ‘Cady’.
    Use ‘'Cady’ instead of ‘Cady’.
Cady :: SpringFlingQueen

There are two things to note in the experiment with XDataKinds turned on:

We can query for the kind of Cady, since it is now a type-level value and SpringFlingQueen is now a kind-level type.
It compiles! However, if you have the -Wunticked-promoted-constructors warning on like I do, that will get triggered. While ticks are not required (your code compiles without them), you’ll typically want to prefix your data constructors with a tick to disambiguate them from regular, unpromoted data constructors:

data SpringFlingQueen = 'Cady | 'Regina

datatype promotion: the `DataKinds` extension takes your types to a new level

Here’s a quick summary of what you can and can’t promote:

Things that get promoted	Things that don’t get promoted
data types	type synonyms
newtypes	type/data families
GADTs	higher-kinded types (such as `data Fix f = In (f (Fix f))`
	datatypes whose kinds involve promoted types (such as `Vec :: * -> Nat -> *`)
	data constructors that are kind polymorphic, involve constraints, mention type or data families

Relationship between values, terms, types, and kinds

Drawing the distinction between these worlds can be confusing. While each category deals in different entities, they all use the same literal syntax. Recall that values only exist at run-time, and are denoted by terms which appear in the syntax (1). I made the following chart to demonstrate the relationship between a few simple values. Typing the leftmost entity into GHCi using :t if it’s a value and :k if it’s a type yields the following results:

value-level (run-time) constructs denoted by terms	type-level (compile-time) constructs	kind-level constructs
`1`	`Num p => p`	`Nat`
`1 :: Int`	`Int`	`*`
`True`	`Bool`	N/A
N/A	`Bool`	`*`
N/A	`'True`	`Bool`
`Just 1`	`Maybe`	`Maybe Nat`
`Just 1`	`Maybe`	N/A
`Just True`	`Maybe Bool`	`Maybe Bool`
N/A	`Just 'True`	`Maybe Bool`
N/A	`'Just True`	`Maybe Bool`
N/A	`'Just 'True`	`Maybe Bool`
N/A	`[]`	`* -> *`
N/A	`Either`	`* -> * -> *`
N/A	`Either Int`	`* -> *`
N/A	`Either Int Int`	`*`

The difference between “type” and “type-level”

There’s a subtle but essential difference between something being a type, vs. being at the type-level. Although datatype promotion allows us to represent constructors that otherwise denote values at the type-level (such as True being promoted to 'True'), this does not make them types. For something to be a type, it must have the ability to be inhabited, meaning that a value of its type exists. Bool, for example, is a type because it can have inhabitants (i.e., True and False), whereas 'True is not a type because it can not have inhabitants, despite having compile-time significance.

Type Families

Functions are to values what type families are to types. Enabled with the -XTypeFamilies flag in GHCi or the {-# LANGUAGE TypeFamilies #-} option in your program, they provide a way to specify how to map one type to another. For example, string concatenation would normally be accomplished through the ++ or <> functions, but those operators are specifically for values. To concatenate type-level strings, GHC.TypeLits defines a type family called AppendSymbol:

type family AppendSymbol (m :: Symbol) (n :: Symbol) :: Symbol

AppendSymbol works similarly to <>, but over types instead of values:

λ :kind! AppendSymbol "so " "fetch!"
= "so fetch!"

The type system can’t run regular Haskell functions. It can, however, evaluate type families. Since type families map types to other types, they behave a lot like functions. For example, the type family + (of kind Nat -> Nat -> Nat), when applied to the type-level naturals 5 and 2, evaluates to the Nat values 7 just as the value-level function (+) (of type Num p => p -> p -> p) would behave when applied to Int values.

There are two types of type families: closed and open.

Closed type families

Closed type families are defined all in one place. They’re closed in the sense that they can not be extended: you can’t add more cases to them somewhere else. These are defined using the where clause and a list of their cases, also known as “type instances”:

type family Wednesday a where
  Pink Int = [Int]
  Pink Char = (String, Int)

Open type families

Open type families are defined without the where clause followed by a list of instances, and can appear either at the top-level or in the body of a typeclass. Unlike closed type families, which are a black-box, open type families can be extended.

Here is an example of a type family declared at the top-level:

type family Skirts a

Below is another example of a type family declared at the top-level, this time with instances:

type family UnsignedVersionOf t

type instance UnsignedVersionOf Int   = Word
type instance UnsignedVersionOf Int32 = Word32
type instance UnsignedVersionOf Int64 = Word64

Notice that type families are parametric, in that they contain a polymorphic type variable t which can be instantiated with different parameters, allowing specialized representations. In the above example, Int, Int32 and Int64 all result in different assignments. This is powerful if you want to yield different results depending on the type.

When they appear in the body of a typeclass, they’re also called associated types. For example:

class IronicTees a where
  type Pink a

In this case, we don’t have to say type family, just type, because any type associated with a class makes an open type family. Another example of an associated type is Rep from Generic or Rep1 from Generic1.

Note, this looks like filling out cases of a function, and that’s because it is. We’re declaring that the UnsignedVersionOf type family maps types to other types, and that this mapping occurs at compile-time. This is how functions work on the type-level.

Associated types

Similarly, Rep is an associated type of the Generic class that maps a type to the generic information about that type available to the compiler. It is defined by the GHC.Generics class as:

type Rep a :: Type -> Type

When you invoke Rep Int in the type-level, GHC does some computation and gives you back some type describing Int#, an unboxed type. This type is usually based on one of the generic types denoting the shape of a data type, (for example, M1, K, etc.)

Unboxed types have the kind #. Usually, values are contained in a “box”, in that they are represented by a pointer to a heap-allocated object. This way of boxing values enables lazy evaluation, but can result in slow compile times. Conversely, unboxed values get at the raw values directly.

GHC.TypeLits

Importing GHC.TypeLits exposes the names of kinds Nat and Symbol (the ones we saw earlier when surveying the kinds for integer and string values). The Num class defines operations (such as +) that can be used on types that instantiate it (such as Integer). Functions like +, however, can only be used to perform computations over values (like 6+2 which yields 8). If we wanted to perform these computations at the type-level, we’d need type families. GHC.TypeLits provides the type families that can be used over type-level literals. For example, if we wanted binary addition via +, we have (type family (m :: Nat) + (n :: Nat) :: Nat), allowing us to perform these computations over type-level values:

import GHC.TypeLits

This module provides us with constructors and operations that add type information to the literals that otherwise hold no meaning at compile-time.

If we want the run-time value of a type-level number like 5, we can use natVal :: forall n proxy. KnownNat n => proxy n -> Integer. To do this, we will also need Data.Proxy:

import Data.Proxy

Proxy a allows us to use a value whose type us parameterized by some arbitrary thing of arbitrary kind. For example, Proxy a could be Proxy Int—Int :: * and so Proxy :: * -> *. Proxy Show is another valid example since Show :: * -> Constraint, so Proxy :: (* -> Constraint -> *). Proxy :: forall k (a :: k) . a -> * meaning that k and a are instantiated to the relevant types and kinds.

If we import Data.Proxy and type the following into the REPL, we’ll see that it returns the value, but it is able to perform the computation at the type-level.

> natVal (Proxy :: Proxy 5)
5

Proxy doesn’t hold any data, and it has one constructor (also known as Proxy). To drive home the relationship between value and type level paradigms, consider a simple addition operation. Addition using the type operator + provided by Prelude, (+) :: a -> a -> a , we know it is a function that works on values. However, what if we wanted to perform this not over values but over types? GHC.TypeLits gives us a series of type families that can be used in place of these.. In this case, we want to invoke +, not the function from Prelude but the type-family from TypeLits: type family (m :: Nat) + (n :: Nat) :: Nat which provides addition of type-level natural numbers. For example,

> natVal (Proxy :: Proxy (5+5))
10

This allows us to do 5+5 at the type-level, and return a value, 10.

Distinguishing between DataKinds and GHC.TypeLits

At a high level, one is a language extension while the other is a module. Modules never affect the meaning of syntax, only the availability of symbols and instances. Conversely, language extensions can add syntax and change semantics, but usually don’t affect the available symbols (with the exception of NoImplicitPrelude).

GHC.TypeLits gives us the identifiers associated with GHC’s type literal support: the Nat and Symbol kinds, the natVal and symbolVal functions, operations such as +, -, *, etc. for forth type families.

-XDataKinds gives us the language features we need to use these: the syntactic support for 1,2,3 literals for Nat values and "pink" literals for Symbols, as well as the 'Ticked constructor syntax for user-defined kinds.

You rarely ever use TypeLits without DataKinds, but you could use DataKinds without TypeLits (if you didn’t need type-level naturals or symbols). For example, here’s what we can get away with without importing GHC.TypeLits, assuming we have DataKinds and PolyKinds on:

class Pink a where
  pink :: proxy a -> String -- like Show, but doesn’t need to take a value, just a proxy to sort out the type

instance Pink 1 where
  pink _ = "1"

Let’s try the same, but for KnownNat:

class Pink a where
  pink :: proxy a -> String -- like Show, but doesn’t need to take a value, just a proxy to sort out the type

instance KnownNat n => Pink n where
  pink n = show $ natVal n

In GHCi when you turn on the language extension XDataKinds and look at its kind:

> :k "a"
"a" :: ghc-prim-0.5.3:GHC.Types.Symbol

The hype of dependently-typed languages

While this post won’t make you the Regina George of advanced type-level programming techniques overnight, I hope it provides readers with enough threads to pull on as they move in that direction. Moreover, it provides sufficient context to start watching a cool trend: dependently-typed languages are growing in popularity. While type systems let us validate programs with respect to a fixed set of criteria, dependent types are much more flexible because they extend the guarantees expected of regular type systems to other aspects of program behavior. Users can load up their type systems with more information about values, thereby introducing more constraints than the status quo which can provide stronger safety guarantees.

Type-level programming is so fetch! The end!

References

Special shout out to @patrickt and @robrix for meticulous edits, for being bottomless pits of Haskell knowledge, and a pleasure to work with daily.