Overview

The lisp-rs project implements an interpreter, in Rust, for a small subset of Scheme, a Lisp dialect. The main goal of this document is to make sure the reader understands the inner details of how the interpreter was implemented.

The project was inspired by Peter Norvig's article (How to Write a (Lisp) Interpreter (in Python)) and the book Writing An Interpreter In Go. This document serves as a commentary on the code that implements the interpreter. Rust's rich programming constructs such as enum, pattern matching, and error handling make it easy and a joy to implement this bare-bone interpreter.

Pre-requistes

To make the most out of this project, it is expected that the user is aware of the following Computer Science concepts

Rust is a non-trivial language, however, to implement the Lisp interpreter, the reader needs to have moderate experience with the language. Knowing the following four concepts should be enough for the user to understand the whole project

Lisp Dialect

In order to keep the interpreter simple and its implementation easy to understand, the number of features supported by it has been limited on purpose. Following are the data types and statements that will be supported by the interpreter.

Data types

  • integer
  • boolean

Statements

  • variable definition and assignment
  • if-else
  • function definition using lambdas
  • function calls

Keywords

  • define
  • if-else
  • lambda

Examples

Following are some of the sample programs that you will be able run using the interpreter

    (
        (define factorial (lambda (n) (if (< n 1) 1 (* n (factorial (- n 1))))))
        (factorial 5)
    )
    (
        (define pix 314)
        (define r 10)
        (define sqr (lambda (r) (* r r)))
        (define area (lambda (r) (* pix (sqr r))))
        (area r)
    )

Interpreter

The interpreter will be implemented from scratch and without the help of any tools such as nom or pest. The interpreter implementation is broken down into four parts

The best way to understand the implementation of the interpreter is to check out the code and walk through it while reading this document.

git clone https://github.com/vishpat/lisp-rs
git checkout 0.0.1

Once you thoroughly understand the implementation, you will be equipped to add new features to it, such as support for new data types like strings, floating-point numbers, lists, or functional programming constructs such as map, filter, reduce functions, etc.

REPL

To run the interpreter and get its REPL (Read-Eval-Print-Loop)

cargo run

Introduction

Lisp is an abbreviation for List Processor. Every Lisp program is simply a list. The elements of this list can either be atomic values such as an integer or a string or another list. Thus a Lisp program is a recursive list as shown below.

List Recursion

The Lisp interpreter is a software program that parses the text of a Lisp program and creates an in-memory List-based recursive data structure. Once the Lisp program is represented as a data structure, interpreting the program involves recursively evaluating these lists. This is the core of what a Lisp interpreter does, there is nothing more to it. Simple and Beautiful.

The following chapters in this book will walk you through the code that implements this interpreter. The interpreting process can be broken down into three phases.

  • Lexing: Converting the Lisp program text into a stream of tokens.
  • Parsing: Converting the stream of tokens into an in-memory recursive list data structure.
  • Evaluation: Walking the in-memory recursive list and producing the final result.

In addition, the final chapter will implement a simple REPL to evaluate the Lisp program.

Lexer

Source

lexer.rs

Code Walk Through

Lexer is a component of the interpreter that takes the program text and converts it to a stream of atomic units known as tokens. Every token has a type and may even have a value associated with it. In the case of our interpreter, there are four types of tokens. The four tokens can be represented using a Rust enum as follows.

pub enum Token {
    Integer(i64),   
    Symbol(String),                 
    LParen,     
    RParen,           
}
  • Integer: A signed 64-bit integer
  • Symbol: Any group of characters other than an integer or parenthesis
  • LParen: Left parenthesis
  • RParen: Right parenthesis

The first step in implementing a lexer is to replace the parenthesis with an extra space before and after it. For example,

(define sqr (* x x))

will get converted to

( define sqr ( * x x ) )

With this simple trick, the process of tokenization involves splitting the Lisp program with whitespace.

    let program2 = program.replace("(", " ( ")
                          .replace(")", " ) ");
    let words = program2.split_whitespace();

Once the words for the program are obtained, they can be converted into tokens using Rust's pattern matching as follows

let mut tokens: Vec<Token> = Vec::new();
for word in words {
    match word {
        "(" => tokens.push(Token::LParen),
        ")" => tokens.push(Token::RParen),
        _ => {
             let i = word.parse::<i64>();
                if i.is_ok() {
                  tokens.push(Token::Integer(
                  	i.unwrap()));
                } else {
                  tokens.push(Token::Symbol(
                  	word.to_string()));
                }        
            }
    }
}

At this point, we have a vector of tokens for the entire Lisp program. Note that a vector in Rust is a stack, hence the tokens are stored in the vector in the reverse order as shown below with an example.

List Recursion

Testing

The lexer code can be unit tested as shown below

let tokens = tokenize("(+ 1 2)");
assert_eq!(
    tokens,
    vec![
        Token::LParen,
        Token::Symbol("+".to_string()),
        Token::Integer(1),
        Token::Integer(2),
        Token::RParen,
    ]
);

To cement your understanding of the Lexing process please go through the remaining tests in lexer.rs

Parser

Source

parser.rs

Code Walk Through

The job of the parser is to take the vector of tokens and convert it into a recursive list structure (mentioned in the introduction). This recursive list structure is an in-memory representation of the Lisp program.

Object model

Before diving into the details of how the recursive list structure is created, we need to first define the objects that will make up the elements of the list. This is done in the object Rust module with an enum defined as follows

pub enum Object {
    Void,
    Integer(i64),
    Bool(bool),
    Symbol(String),
    Lambda(Vec<String>, Vec<Object>),
    List(Vec<Object>),
}
  • Void: An empty object
  • Integer: A signed 64-bit integer
  • Bool: A boolean value
  • Symbol: A Lisp symbol, similar to the Symbol token
  • Lambda: Function object, with the first vector representing the parameter labels and the second vector representing the body of the function. This object is not used during parsing but will be used during the evaluation phase of the interpreter.
  • List: List object

Parser

The parser for the Lisp dialect is a simple recursive function. It takes a vector of tokens (in reverse) generated by the lexer and generates a single List Object representing the entire Lisp program. The core logic of the parser is implemented by the recursive parse_list function as shown below. It expects a vector of tokens, with the first token of the vector being a left parenthesis indicating the start of the list. The function then proceeds to process the elements of the list one at a time. If it encounters atomic tokens such as an integer or symbol it creates corresponding atomic objects and adds them to the list. If it encounters another left parenthesis it recurses with the remaining tokens. Finally, the function returns with the list object when it encounters the right parenthesis.

let token = tokens.pop();
if token != Some(Token::LParen) {
    return Err(ParseError {
        err: format!("Expected LParen, found {:?}", 
                     token),
    });
}
   
let mut list: Vec<Object> = Vec::new(); 
while !tokens.is_empty() {
    let token = tokens.pop();
    if token == None {
        return Err(ParseError {
            err: format!("Insufficient tokens"),
        });
    }
    let t = token.unwrap();
    match t {
        Token::Integer(n) => 
            list.push(Object::Integer(n)),
        Token::Symbol(s) => 
            list.push(Object::Symbol(s)),
        Token::LParen => {
            tokens.push(Token::LParen);
            let sub_list = parse_list(tokens)?;
            list.push(sub_list);
        }
        Token::RParen => {
            return Ok(Object::List(list));
        }
    }
}

Testing

The above parsing code can be unit tested as follows

let list = parse("(+ 1 2)").unwrap();
assert_eq!(
    list,
    Object::List(vec![
        Object::Symbol("+".to_string()),
        Object::Integer(1),
        Object::Integer(2),
    ])
);

To cement your understanding of the parsing process please go through the remaining tests in parser.rs

Evaluator

Now comes the most exciting part of the project. Evaluation is the final step that will produce the result for the Lisp program. At a high level, the evaluator function recursively walks the List-based structure created by the parser and evaluates each atomic object and list (recursively), combines these intermediate values, and produces the final result.

Source

eval.rs

Code Walk Through

The evaluator is implemented using the recursive eval_obj function. The eval_obj function takes the List object representing the program and the global env variable (a simple hashmap) as the input. The function then starts processing the List object representing the program by iterating over each element of this list

fn eval_obj(obj: &Object, env: &mut Rc<RefCell<Env>>) 
	-> Result<Object, String> 
{
    match obj {
        Object::Void => Ok(Object::Void),
        Object::Lambda(_params, _body) => Ok(Object::Void),
        Object::Bool(_) => Ok(obj.clone()),
        Object::Integer(n) => Ok(Object::Integer(*n)),
        Object::Symbol(s) => eval_symbol(s, env),
        Object::List(list) => eval_list(list, env),
    }
}

In the case of the atomic objects such as an integer and boolean, the evaluator simply returns a copy of the object. In the case of the Void and Lambda (function objects), it returns the Void object. We will now walk through the eval_symbol and eval_list functions which implement most of the functionality of the evaluator.

eval_symbol

Before understanding the eval_symbol function, it is important to understand the design of how variables are implemented for the Lisp interpreter.

The variables are just string labels assigned to values and they are created using the define keyword. Note a variable can be assigned atomic values such as integer or a boolean or it can be assigned function objects

( 
  (define x 1) 
  (define sqr (lambda (r) (* r r))) 
)

The above Lisp code defines (or creates) two variables with the names x and sqr that represent an integer and function object respectively. Also, the scope of these variables lies within the list object that they are defined under. This is achieved by storing the mapping from the variable names (strings) to the objects in a map-like data structure called Env as shown below.

// env.rs
pub struct Env {
    parent: Option<Rc<RefCell<Env>>>,
    vars: HashMap<String, Object>,
}

The interpreter creates an instance of Env at the start of the program to store the global variable definitions. In addition, for every function call, the interpreter creates a new instance of env and uses the new instance to evaluate the function call. This new instance of env contains the function parameters as well as a back pointer to the parent env instance from where the function is called as shown below with an example

(
	(define m 10)
	(define n 12)
	(define K 100)
	
	(define func1 (lambda (x) (+ x K)))
	(define func2 (lambda (x) (- x K)))
	
	(func1 m)
	(func2 n)
)

Function Call

This concept will become clearer as we will walk through the code.

The job of eval_symbol is to look up the Object bound to the symbol. This is done by recursively looking up in the passed env variable or any of its parent env until the root env of the program.

let val = env.borrow_mut().get(s);
if val.is_none() {
    return Err(format!("Unbound symbol: {}", s));
}
Ok(val.unwrap().clone())

eval_list

The eval_list function is the core of the evaluator and is implemented as shown below.

let head = &list[0];
match head {
    Object::Symbol(s) => match s.as_str() {
        "+" | "-" | "*" | "/" | "<" | ">" | "=" | "!=" => {
            return eval_binary_op(&list, env);
        }
        "if" => eval_if(&list, env),
        "define" => eval_define(&list, env),
        "lambda" => eval_function_definition(&list),
        _ => eval_function_call(&s, &list, env),
    },
    _ => {
        let mut new_list = Vec::new();
        for obj in list {
            let result = eval_obj(obj, env)?;
            match result {
                Object::Void => {}
                _ => new_list.push(result),
            }
        }
        Ok(Object::List(new_list))
    }
}

This function peeks at the head of the list and if the head does not match the symbol object, it iterates all of the elements of the list recursively evaluating each element and returning a new list with the evaluated object values.

Variable definitions

If the head of the list in the eval_list function matches the define keyword, for example

(define sqr (lambda (x) (* x x)))

the eval_define function calls eval_obj on the third argument of the list and assigns the evaluated object value to the symbol defined by the second argument in the list. The symbol and its object value are then stored in the current env.

let sym = match &list[1] {
    Object::Symbol(s) => s.clone(),
    _ => return Err(format!("Invalid define")),
};
let val = eval_obj(&list[2], env)?;
env.borrow_mut().set(&sym, val);

In the example above the symbol sqr and the function object representing the lambda will be stored in the current env. Once the function sqr has been defined in this manner, any latter code can access the corresponding function object by looking up the symbol sqr in env.

Binary operations

If the head of the list in the eval_list function matches a binary operator, the list is evaluated on the basis of the type of the binary operator, for example

(+ x y)

the eval_binary_op function calls the eval_obj on the second and third element of the list and performs the binary sum operation on the evaluated values.

If statement

If the head of the list in the eval_list function matches the if keyword, for example

(if (> x y) (x) (y))

the eval_if function calls eval_obj on the second element of the list and depending upon whether the evaluated value is true or false, calls the eval_obj on either the third or fourth element of the list and returns the value

let cond_obj = eval_obj(&list[1], env)?;
let cond = match cond_obj {
    Object::Bool(b) => b,
    _ => return Err(format!("Condition must be a boolean")),
};

if cond == true {
    return eval_obj(&list[2], env);
} else {
    return eval_obj(&list[3], env);
}

Lambda

As mentioned earlier, the lambda (or function) object consists of two vectors

Lambda(Vec<String>, Vec<Object>)

If the head of the list in the eval_list function matches the lambda keyword, for example

(lambda (x) (* x x))

the eval_function_definition function evaluates the second element of the list as a vector of parameter names.

let params = match &list[1] {
    Object::List(list) => {
        let mut params = Vec::new();
        for param in list {
            match param {
                Object::Symbol(s) => params.push(s.clone()),
                _ => return Err(format!("Invalid lambda parameter")),
            }
        }
        params
    }
    _ => return Err(format!("Invalid lambda")),
};

The third element of the list is simply cloned as the function body.

let body = match &list[2] {
    Object::List(list) => list.clone(),
    _ => return Err(format!("Invalid lambda")),
};
Ok(Object::Lambda(params, body))

The evaluated parameter and body vector are returned as the lambda object

Function Call

If the head of the list is a Symbol object and it does not match any of the aforementioned keywords or binary operators, the interpreter assumes that the Symbol object maps to a Lambda (function object). An example of the function call in Lisp is as follows

(find_max a b c)

To evaluate this list the eval_function_call function is called. This function first performs the lookup for the function object using the function name, find_max in the case of this example.

let lamdba = env.borrow_mut().get(s);
if lamdba.is_none() {
    return Err(format!("Unbound symbol: {}", s));
}

If the function object is found, a new env object is created. This new env object has a pointer to the parent env object. This is required to get the values of the variables not defined in the scope of the function.

let mut new_env = Rc::new(
    			  RefCell::new(
    			  Env::extend(env.clone())));

The next step in evaluating the function call requires preparing the function parameters. This is done by iterating over the remainder of the list and evaluating each parameter. The parameter name and the evaluated object are then set in the new env object.

for (i, param) in params.iter().enumerate() {
    let val = eval_obj(&list[i + 1], env)?;
    new_env.borrow_mut().set(param, val);
}

Finally, the function body is evaluated by passing the new_env, which contains the parameters to the function

let new_body = body.clone();
return eval_obj(&Object::List(new_body), &mut new_env);

REPL

Source

main.rs

Code Walk Through

The REPL is the simplest part of the interpreter implementation. It uses the linefeed crate to implement the functionality.

The REPL first creates an instance of env to hold the global variables for the interpreter.

let mut env = Rc::new(RefCell::new(env::Env::new()));

The REPL is a simple while loop that takes one line as an input from the user, evaluates it, and prints out the evaluated object. The REPL is terminated if the user enters the exit keyword.

while let ReadResult::Input(input) = 
  			  reader.read_line().unwrap() {
    
    if input.eq("exit") {
        break;
    }
    let val = eval::eval(input.as_ref(), &mut env)?;
    match val {
        Object::Void => {}
        Object::Integer(n) => println!("{}", n),
        Object::Bool(b) => println!("{}", b),
        Object::Symbol(s) => println!("{}", s),
        Object::Lambda(params, body) => {
            println!("Lambda(");
            for param in params {
                println!("{} ", param);
            }
            println!(")");
            for expr in body {
                println!(" {}", expr);
            }
        }
        _ => println!("{}", val),
    }
}

What's next

Congratulations!!! if you have made it so far, you now have a grasp of how to implement a Lisp interpreter in Rust. A few easy things to try next would be to add support for float point numbers, unsigned integers, and strings. Once you are able to successfully make these changes the next thing to try would be to add support for the List datatype followed by support for the functional constructs of map, filter, and reduce.

With this knowledge, you should be then able to implement an interpreter for any non-trivial language.