The Y-Combinator from Scratch


Symmetry is a complexity-reducing concept (co-routines include subroutines); seek it everywhere.
--Alan Perlis

Recursion has haunted me ever since I first encountered it. So a few days ago, I deliberately forced myself into implementing recursion in some non-conventional way. In some ways, I was trying to achieve the general outcome of recursion without having to rely on explicit recursion. After burning the midnight oil and countless keystrokes, what I had before me was akin to a slain dragon. The dragon was, as I confirmed later, nothing other than the famous fixed-point combinator, otherwise referred to as the Y-combinator. At this point, you don’t need to be able to understand all the words in the last sentence. That is exactly what we are here for.

Let’s figure out the underlying mechanism of the Y-combinator out of necessity. In other words, we will limit our programming language to disallow explicit recursion, and yet try to achieve recursive mechanism in the constrained language. The working programmers will find this post a much more delightful experience than actually examining definitions and syntax from the lambda calculus. Let’s begin by solving a simple challenge:

An Odd Problem

Write a function in Python (or any language for that matter) that calculates the factorial of a given number n. You may only use function calls, but you may not call a function from inside its body i.e. no explicit recursion! In other words, your solution should be rewritable as a lambda expression which returns the factorial of n. Also, assume that you language doesn’t support looping mechanisms.

For instance, here’s a correct but invalid implementation of a function that calculates factorial.

factorial = lambda n: 1 if n == 0 else n * factorial(n-1)
factorial(5) # outputs 120

The above solution is invalid since we are calling the function factorial from inside its body. I am using lambda expressions instead of function definition(s) because I find them more convenient; but you may start out using def as long as you can later rewrite your valid solution into an equivalent lambda expression.

How can we rewrite factorial such that we don’t use any function name(s) inside its body? In other words, can you run a recursive algorithm in a general-purpose Turing-complete programming language that doesn’t allow calling functions by name from within their bodies? You may assume, of course, that the language supports passing functions as arguments.

Baby Steps

Let’s define part_factorial, an incomplete but valid implementation. It might seem like a useless step, but this is the best baby step we could take in the functional programming universe we are stuck in.

# For case where n != 0, it will return n * f(n-1) ;
# Note that we will have to pass some function f
part_factorial = lambda f: lambda n: 1 if n == 0 else n * f(n-1)

Now we can chain part_factorial to ‘manually’ achieve a pseudo-recursive solution.

part_factorial(part_factorial)(0) # 1
part_factorial(part_factorial(part_factorial))(1) # 1
part_factorial(part_factorial(part_factorial(part_factorial)))(2) # 2
part_factorial(part_factorial(part_factorial(part_factorial)))(3) # !ERROR

Even though we didn’t achieve much in the above code, we did manage to write a correct (however incomplete) function definition which can be fully replaced with a lambda expression. For instance, we can rewrite part_factorial(part_factorial)(0) as

(lambda f: lambda n: 1 if n == 0 else n _ f(n-1))(lambda f: lambda n: 1 if n == 0 else n _ f(n-1))(0) #1

Now if the above is clear to you, let’s get to the main solution we arrived at i.e. the chained way: part_factorial(part_factorial....)..)(n). Aren’t we close to a solution? We are, if we can somehow use a ‘meta’ function which will keep applying the part_factorial function for as long as needed.

Solution- A Function that Keeps Applying a Function

Let’s define meta_factorial(copy0, copy1) which takes two copies of some function that ‘does’ the factorial calculation part. We would want to give two copies of the same underlying function definition so that the meta_factorial function can line up one copy after the other (and hopefully halts). More importantly, since the function definitions (of copy0 and copy1) will be present within the scope of meta_factorial, it will always be available and meta_factorial will orchestrate the functionality of applying the function without explicitly calling the function using a name- hence solving the problem.

meta_factorial = lambda copy0, copy1: lambda n: 1 if n==0 else n * copy0(copy1, copy1)(n-1)

copy0 and copy1 are placeholders for indeed the same function implementation. Don’t worry what that function should be, but do pay attention to how we are planning to solve the problem. Instead of passing a function name, we are sending function ‘mechanics’ (i.e. function definition, also called abstraction) in the form of lambda expressions, which, when evaluated (also called application), ‘progresses’ the solution. But what should be the lambda expressions that we should pass for copy0 and copy1? The answer is… meta_factorial itself!

You can confirm that meta_factorial(meta_factorial, meta_factorial)(n) is indeed the function that would calculate the factorial for us recursively (and yet, without having to rely on recursion). In other words, say your language stopped supporting recursion, you could still achieve recursion. And it works!

You can confirm that our solution is indeed valid by replacing every symbol meta_factorial with its lambda equivalent definition. The reason we can do this verbatim replacement is because meta_factorial is a combinator.

A combinator, in lambda calculus, is a lambda expression that has no free variables.

That is, any variable (or function) name that was used during the evaluation of the expression meta_factorial(meta_factorial, meta_factorial) was already contained within) the body of meta_factorial itself.

As to why feeding copies of meta_factorial to itself (as in meta_factorial(meta_factorial, meta_factorial)) works and how I came up with the answer is difficult to explain- it was a pure gut feeling. I can attempt to explain it, but I’d be doing you more harm. The proof is in the pudding. Therefore, I think you should sit down (or stand up), stare at the everything above that you’ve fed into your IDLE by now, scribble a bit and try to re-explain everything to yourself… Now, the reason it works and why I must have come up with this solution is that what we generally call a function (say, in Python) has two meanings- 1. the actual definition (also called abstraction in the lambda calculus), and 2. the application of it. Try to pick that nuance and you’d hopefully get everything here too without too much trouble.

For the sake of completeness, you may confirm that the equivalent lambda expressions, when evaluated, will calculate the factorial:

(lambda copy0, copy1: lambda n: 1 if n==0 else n * copy0(copy1, copy1)(n-1))(
    lambda copy0, copy1: lambda n: 1 if n==0 else n * copy0(copy1, copy1)(n-1),
    lambda copy0, copy1: lambda n: 1 if n==0 else n * copy0(copy1, copy1)(n-1)) (5) #120

From Solution to the Y-Combinator

We have solved the original problem. But if you stare long enough at the above code, it is begging to be refactored. As a matter of, we can do it all with just one copy of meta_factorial.

meta_factorial = lambda copy0, n: 1 if n==0 else n*copy0(copy0,(n-1))
meta_factorial(meta_factorial, 11) # 39916800 <--- correct but takes two args

If you want to fix the function meta_factorial to take only only argument, you can easily fix that.

meta_factorial = lambda copy0: lambda n: 1 if n==0 else n*copy0(copy0)(n-1)
meta_factorial(meta_factorial)(11) # 39916800

In pure lambda expressions form,

(lambda copy0: lambda n: 1 if n==0 else n*copy0(copy0)(n-1))(
    lambda copy0: lambda n: 1 if n==0 else n*copy0(copy0)(n-1))(11) # 39916800

The problem is solved; and yes you do not need your programming language to support recursion to achieve recursion. What is going on here and how it worked is actually due to a fixed-point combinator called the Y-combinator (discovered first by Haskell Curry). A fixed-point combinator is a combinator (defined above) returns some fixed point of its argument function (i.e. it returns a function that is applied to its argument (which could very well be a function too)). And this is what generalizes the concept of recursion without having to call a function from inside itself.

Right now, we do not see the Y-combinator because it is buried somewhere in the implementation above. Once we extract out the Y-combinator, we would be able to use it as a generalized higher-order function that convert any explicitly recursive problem into a non-recursive one! Let’s say if the Y-combinator was callable as Y(f), then we could have solved our original problem in one shot, by calling part_factorial(Y(f))(n) where part_factorial = lambda f: lambda n: 1 if n == 0 else n * f(n-1).

In the follow-up post, we will derive the Y-combinator from all the work that we did above and use it to build solutions for common recursion-based problems.

Where to go from here:

  1. Try solving the original problem, but without looking at the solutions in the post. I spent a spend a good night.
  2. Try deriving the Y-combinator yourself. The Wikipedia entry has a lot of hints.
  3. Some optimal resources on lambda calculus: 1, 2, 3.
  4. The Y Combinator (Slight Return) by Mike Vanier is an excellent post. It’s so good that if I had found it earlier, I wouldn’t have written this very post.
  5. Read my follow-up post- the Y and Z combinators in Python.