Monday, April 9, 2018

Deconstructing Lambdas, Closures and Application

New Core Grammar

I've removed App and Closure from the core grammar and in their place added SetEnv and Defer. I believe this change will make optimizations easier.

The new grammar is:
  • Zero
  • Pair
  • Env (was Var)
  • Abort
  • Defer (replacing and similar to Closure)
  • SetEnv (replacing and similar to App)
  • Gate
  • PLeft
  • PRight
  • Trace

A Naive Implementation of Lambda Calculus

SIL started off as an implementation of simply-typed lambda calculus, and to understand its current design, it's helpful to know what it was when it started.

Most naive implementations represent variable identifiers as strings in the AST, and the runtime keeps track of a stack of unbound identifiers as well as a map of bound identifiers and their values.

So for example, in
 (\y -> (\x -> x + y)) 1 2
 The runtime would
  1. process lambda x, pushing "x" to the stack and creating a closure
  2. process lambda y, pushing "y" to the stack and creating a closure
  3. take 1, pop "y" off the stack, add an entry for "y:1" in the bound variable map, and return the closure from step 1
  4. take 2, pop "x" off the stack, add an entry for "x:2" in the bound variable map, and evaluate the closure
I didn't want to have strings in my AST, though. My goal has always been to keep the core grammar and an interpreted runtime for SIL as simple as possible, so I don't want to store strings or integers in the AST. Since combinations of Pairs and Zeros is sufficient to represent strings, integers, and pretty much any other type of data, I stick with them.

So my initial implementation used De Bruijn indices so that variables could simply be represented by integers encoded in SIL Pair/Zero combinations.

Closure vs Lambda

When I first started SIL, the runtime representation was exactly the same as the input grammar, except it had Closures. In a fit of concision obsession, I wondered if could make the two representations the same.

What is the difference between a lambda and a closure with an empty environment? I decided there wasn't any compelling difference, and removed lambdas from the grammar.

Then I decided to represent closure environments in the core grammar. All the environment is, is a stack of values to be assigned to variables, and because the core grammar already had a Pair representation, I could nest Pairs to create a stack, terminated with a Zero.

Once a stack of bound variables is available, referring to a particular variable is simply a matter of how you traverse the stack. Going to the right is equivalent to popping a variable off the stack; going to the left is equivalent to selecting a variable. It's similar to De Bruijn indices, where the index number is the number of rights taken in the path through the stack.

So a lambda is equivalent to closure with a Zero for the stack, representing empty.

The Mechanics of Function Application

With the closure environment represented explicitly, I could see the rules of how function application changed the stack. I noticed that rearranging the stack is matter of traversing the existing stack and building a new stack, both of which could be largely accomplished with the existing Core Grammar.

So rather than have an instruction that both adds something to the closure stack and then evaluates the closure, why not have an instruction that simply evaluates a Pair of a closure and its environment? Then you can pass in any environment you want. This is what SetEnv does.

At this point the Closure representation is a pair of a function and an environment built for it. Since it's a pair, why not just make it a Pair and remove Closure from the grammar?

But what is the function part of what was a Closure? It turns out it can simply be something that isn't evaluated immediately, but only when later an environment is given to it. Thus we get Defer, which doesn't evaluate what is inside it, but returns it verbatim.

The Good Part

So what's the point? Why bother with these trivial changes?

Every step of the way through these changes, I made sure that the original lambda calculus could be represented in the new system, and still functioned correctly.

In the new grammar, a lambda is represented by a Pair containing grammar wrapped in Defer, and Env. When this Pair is evaluated, it "binds" the environment by evaluating Env.

But this also leaves open the possibility for a new type of lambda: one that does not bind the existing environment. By changing Env to Zero, we can guarantee that functions are self-contained.

In fact, using this new grammar, it's very easy to tell if a given expression has unbound variables: if the AST has Envs that aren't underneath a SetEnv.

Partial evaluation is now very simple - just traverse the AST and evaluate any sections where variables are fully bound.

Errors at Runtime, or Before?

The new Abort instruction allows exceptions, but also other types of errors. Runtime exceptions are the worst kind of errors, as they either can't be handled by the program, or are often handled in some sort of ad hoc way that may not be appropriate for the specific exception that occurred.

It's clear that catching errors as early as possible is desirable. Usually programmers use unit tests or possibly sophisticated type checking to catch errors.

With a partial evaluation pass in SIL, all AST sections containing Abort instructions that don't depend on unbound variables will be processed, thus giving a sort of "compile time" error catching mechanism.

We can also lift error checking sections of the code out of the rest of the AST, so that anything that would cause a runtime error can be processed after input but before any other calculations are made, possibly returning faster feedback to the user.

With SIL frames, if we save the old processing state before processing new input, if processing new input generates an error, we can return the user to the old state and inform that they need to make new input to attempt to avoid future errors.

Of course, there will still be situations where any possible user input will generate an error, and those situations are unsolvable (unless we want to go further back and save even older program state to restore to).