The Simply Typed Lambda Calculus

Adding types to the Lambda Calculus

Abstract

This post is part of a series on how to reason about code. It describes the Simply Typed Lambda Calculus, a precursor to the formal language understood by Coq, an interactive proof assistant I am learning. More background can be found here, and the post that describes the λ Calculus itself can be found here.

This exposition follows closely that in "Type Theory & Formal Proof" by Nederpelt & Geuvers (abbreviated TTFP throughout).

Recap

In the last post, we saw that the λ Calculus is way to make precise the notion of computatation (see here for a capsule history as to why we would want to do this). As such, it should not be surprising that it is a very expressive formalism. It is Turing-complete, and in particular it allows us to write programs that never terminate. We saw the self-application function in the last post, but here's another example, once again from TTFP: let Δ be the term $λ x . x x x$ . Then $Δ Δ ↠_{β} Δ Δ Δ ↠_{β} Δ Δ Δ Δ ↠_{β} \dots$ .

In order to rule out such pathological terms, Church in his 1940 paper "A Formulation of the Simple Theory of Types" added a type system to the λ Calculus. We call the language that results from only permitting "well-typed" λ-terms the Simply Typed Lambda Calculus, or $λ \to$ .

$λ \to$

Recall the Untyped λ Calculus: we started with a set of variables $V = {a, b, c, \dots}$ . $V$ is infinite, but start with the lower-case Latin alphabet. We then built terms inductively through:

naming :: if $x \in V$ , then $x$ is a term (in the λ Calculus)
abstraction :: if $T$ is a term & $x \in V$ , then $λ x . T$ is a term, where $x$ may (or may not) occur in $T$
application :: if $T_{1}$ and $T_{2}$ are terms, then $T_{1} T_{2}$ is a term

giving us an infinite set of λ-terms like $x$ , $λ x . y$ , $λ x . y a$ , $λ z . (λ x . y z)$ and so on.

Call the elements of $V$ ordinary variables. We now give ourselves a second infinite set of type variables $V = {α, β, γ, \dots}$ & define the set of all simple types $T$ inductively:

if $α \in V$ , then $α \in T$
if $σ, τ \in T$ , then $(σ \to τ) \in T$

So $T$ contains terms like α, β, $α \to β$ , $(α \to β) \to γ$ and so on. Intuitively, type variables represent what we'd typically think of as a type (the natural numbers $N$ , or bool, for instance), and the arrow suggests mapping; a term with type $σ \to τ$ maps things of type σ to things of type τ.

In $λ \to$ , we assign types (i.e. members of $T$ ) to terms; we write $M : σ$ for "term $M$ has type $σ$ ." Let's start with the simplest λ terms: ordinary variables. We assume each ordinary variable $x$ has a unique type, say σ, so that $x : σ$ . In general, we assume that we have an infinite number of ordinary variables available for each type σ.

The types of applications & abstractions are built from those of their constituents. In the case of abstraction, if $x : σ$ , and $M : τ$ , then $λ x . M$ takes members of type σ and returns members of type τ, hence its type should be $σ \to τ$ . We denote this more fully by writing $λ x : σ . M : σ \to τ$ .

Analogously, the application $M N$ is taking something with the type of $N$ , and $M$ should be an abstraction, and hence have an arrow type; call it $σ \to τ$ . For $M N$ to make any sense, $N$ must have type σ, and hence $M N : τ$ .

At this point in our development of $λ \to$ , we see that there are many terms we can write down that make no sense from a typing perspective. Suppose, for instance, that $M : σ \to τ$ and $N : α$ , then the application $M N$ has no meaning.

We formalize this by defining the set of pre-typed λ-terms $Λ_{T}$ as

\begin{matrix} (1) & Λ_{T} = V | (Λ_{T} Λ_{T}) | (λ V : T . Λ_{T}) \end{matrix}

This is just the set of all terms we can possibly write according to our rules so far. Whether these terms belong in $λ \to$ is a different question. For any given term in $Λ_{T}$ , the issue of whether or not it makes any sense in terms of typing depends on the types of its constituents. We call a collection of type statements $x : σ, y : τ, z : υ, \dots$ where $x, y, z, \dots$ are all ordinary variables (i.e. not more complex terms) a context (generally denoted by an upper-case Greek letter). A judgement is a statement $Γ ⊢ M : σ$ where Γ is a context, $M$ is some term, and σ is a type; the meaning could be given as "given Γ, it can be shown that $M$ has type σ."

A pre-typed λ-term $M$ is called legal if there exist context Γ and type ρ such that $Γ ⊢ M : ρ$ . $λ \to$ is the set of legal pre-typed terms.

Derivations

I've used the phrase "can be shown", above. Let's make that precise.

We can validate a judgement $Γ ⊢ M : σ$ by giving something called a derivation. A derivation is a sequence of applications of derivation rules initialized with the contents of Γ and terminating with the conclusion that $M : σ$ . There are three derivation rules for $λ \to$ :

\begin{aligned} (2) & (var) & Γ ⊢ x : σ if x : σ \in Γ \\ (3) & (appl) & \frac{Γ ⊢ M : σ \to τ Γ ⊢ N : σ}{Γ ⊢ M N : τ} \\ (4) & (abst) & \frac{Γ, x : σ ⊢ M : τ}{Γ ⊢ λ x : σ . M : σ \to τ} \end{aligned}

Each rule tells us what we can conclude so long as we know certain things already. In the case of (3) & (4), the premises are written above the line & the conclusions below. In the case of (2), there are no premises: if $x : σ \in Γ$ , then we are allowed to conclude: $Γ ⊢ x : σ$ . A derivation is collection of applications of these rules, arranged in an inverted tree where the "root", at the bottom, is the statement we are justifying.

Let's work through an example (this is exercise 2.10 (d) in TTFP). Consider the term $λ x : α \to β . (y (x z)) z$ . What type might it have? Well, we know $x : α \to β$ , therefore, if this term is typable, $z$ must have type α due to the presence of the application $x z$ in the middle of the term. That means $x z : β$ , so $y$ must have type $β \to μ$ for some $μ$ . $y (x z)$ is then applied to $z$ , so μ must have type $α \to γ$ for some γ.

So we see that if we assume $y : β \to α \to γ$ and $z : α$ , this term type-checks, and has type $(α \to β) \to γ$ . Let's prove this. To simplify our derivation, let's define $Γ = {x : α \to β, y : β \to α \to γ, z : α}$ . Then:

Figure 1: $λ \to$ Example Derivation

We see that each leaf of this tree is an application of the (var) rule. Beginning in the upper right-hand corner, we've applied (var) twice to conclude that $Γ ⊢ x : α \to β$ and $Γ ⊢ z : α$ , both of which are valid because of our definition of Γ. Those two statements, in turn, justify concluding that $Γ ⊢ x z : β$ by the (appl) rule.

To the left, we state that $Γ ⊢ y : β \to β \to γ$ again by the (var) rule, and combine that with our prior conclusion via the (appl) rule to conclude $Γ ⊢ y (x z) : α \to γ$ . And so forth.

The reader may note a few things from this example. Firstly, derivations can quickly become large & complex. This is a fairly short judgement and it's already spread over the width of the page. Secondly, they entail repetition. In this judgement, we need to introduce $Γ ⊢ z : α$ twice, once for each application to $z$ . This is characteristic, and becomes pronounced in more complex judgements that involve repeated sub-terms.

Nederpelt & Geuvers introduce an alternative format for derivations that they call "flag format". They suggest that this is their invention, but don't quite say it, so I'm not sure where it originated. Here, we surround each type declaration with a box (a "flag") and mark the extent of that declaration's context by drawing a line below it (the "flag pole"); all steps to the right of the flag pole assume that declaration in their context. The flag notation for the above derivation is:

Figure 2: $λ \to$ Example Derivation: Flag Format

We see that this style contains the same information as Figure 1: three invocations of (appl) & one of (abst). We have, however, omitted the applications of the (var) rule in the service of compactness & readability.

Note also that the flag pole corresponding to $x : α \to β$ extends only through line (6)– in line (7) we abstract over $x$ changing it from free to bound (the context exists to type free variables). However, the flagpoles for the other two declarations extend to the bottom of the derivation, indicating that the context in the final judgement must be non-empty. Like Nederpelt & Geuvers, I'll be using the flag format going forward.

Questions We Can Ask

We are interested in three different problems pertaining to judgements in Type Theory:

Well-typedness/Typability: Consider $? ⊢ t e r m : ?$ (or $c o n t e x t ⊢ t e r m : ?$ ); given a term, is it legal in some context (or, is it legal in the given context)?
Type checking: Consider $c o n t e x t ⊢^{?} t e r m : t y p e$ ; given a judgement, is it true?
Term finding: Consider $c o n t e x t ⊢ ? : t y p e$ ; is there a term of the given type in the given context?

Typability

In this case, we are handed a term, perhaps along with a context, and we need to find a derivation. Conceptually, we begin with the desired conclusion & attempt to work backwards. Continuing with our example $λ x : α \to β . (y (x z)) z$ , we look at our goal, see that (abst) is the only rule that could possibly produce that goal, and write:

Figure 3: Typing step 1

We now have, on line (2), an application whose right-hand side is the variable $z$ , meaning that we'll need to declare a type:

Figure 4: Typing step 2

We see we'll again need to apply (appl), this time requiring a declaration of the type of $y$ :

Figure 5: Typing step 3

We see that the only feasible type for $z$ is $α$ , yielding a type of β for the application $x z$ on line (4):

Figure 6: Typing step 4

Since $y$ is being applied to a term of type β on line (5), $y$ must have type $β \to μ$ , for some type μ. But on line (6), a term of type μ is being applied to a type now known to be of type α, hence μ must be $α \to γ$ for some γ.

Hence, we determine that $λ x : α \to β . (y (x z)) z$ is well-typed within $Γ = {y : β \to α \to γ, z : α}$ and has type $(α \to β) \to γ$ :

Figure 7: Typing step 5

As an example of a term that is not typable, consider $λ x : α . x y$ . The reader will see that there is no way to satisfy the premises of (appl) for this term, and hence no way to produce a derivation.

Type Checking

To continue our running example, validating that $y : β \to α \to γ, z : α ⊢ λ x : α \to β . (y (x z)) z : (α \to β) \to γ$ type checks is a bit simpler in that we have the beginning & the end of the derivation handed to us: we just need to fill-in the middle:

Figure 8: Type-checking in $λ \to$

This is left as an exercise for the reader.

Term Finding

If I perhaps skimmed the last section, it was in my hurry to get to term finding. You see, in $λ \to$ , all three of the problems of typability, type checking & term finding are decidable in that an algorithm exists for each that is guaranteed to solve the problem. In the more powerful type systems to come, however, term finding becomes undecidable. This is a pity, since under Curry-Howard, term finding is equivalent to proof finding, but that's a story for a later post.

For now, let's consider the problem of finding a term that inhabits type $(α \to β) \to γ$ . This is an arrow type, so we immediately see that we're going to need a term of type $α \to β$ :

Figure 9: $λ \to$ Example Term Finding: Initial

But at this point, we just need some term of type γ. Now, we could just give ourselves one, at the cost of "injecting" it into our context. That is, we could do this:

Figure 10: $λ \to$ Example Term Finding: First Solution

This derivation only justifies the judgement ${y : γ} ⊢ λ x : α \to β . y : (α \to β) \to γ$ . Suppose instead we wanted to find a term inhabiting this type given the same context. That's a little tougher; we find ourselves here:

Figure 11: $λ \to$ Example Term Finding: Restarted

…and we now need to cast around for a term of type γ within these assumptions. We notice that $y$ has an arrow type that ultimately ends-up in γ, and that the type just to the left of γ is α. We have a term of type α laying around: $z$ :

Figure 12: $λ \to$ Example Term Finding: Next step

Now we need something of type β, and we see immediately that $x$ applied to $z$ will give us that:

Figure 13: $λ \to$ Example Term Finding: Solution

We see that, so long as we want to keep our context, we wound-up with the same derivation given above, but there may well be other terms inabiting any given type. The interesting question is whether the differences in terms inhabiting a type are interesting to us or not, but again, we'll come to that later in our story.

Where Are We?

We've refined our formal model for computation with the addition of types in order to rule-out misbehaving terms. We've also laid out the idea of derivation rules & their application in order to validate judgements regarding type. As we'll see, different derivation rules give rise to different type systems. In $λ \to$ the basic problems of typability, type-checking & term finding are all decidable, but as we move into more powerful type systems, term finding becomes undecidable.

And Church succeeded in his aim: every legal term in $λ \to$ is strongly normalizing. The problem with $λ \to$ is that its nice features come at a cost: the set of legal terms therein is not terribly expressive. One can encode the natural numbers (the typed Church numerals) and define addition & multiplication… but not much more. This has led, in the decades since Church's paper, to attempts to extend the type system applied to the λ Calculus in hopes of getting better expressiveness while maintaining the desirable normalization properties. We'll look at some of them in subsequent posts, because they culminate in the Calculus of Inductive Constructions: the formal language recognized by Coq.

Originally published 05/20/24 06:55. Updated 05/21/2024 to build-out the section on term finding to re-use the same context (which is much more interesting).

Unwound Stack