Tuesday, September 22, 2009

A detailed view of the lisp interpreter

Ok, so here's the deal !!! Given seven primitive functions implemented in a way that can be called/evaluated in the standard lisp format (which I explain in the next paragraph), we can create an interpreter that is capable of understanding and running a few instructions which is equivalent in power to any other program that you can possibly write. When I say equivalent in power I mean that any program written in any other programming language can be re-written in this language, easily, with fewer bugs and more elegantly. The even more amazing thing about this is that the interpreter is itself written in the language which it recognizes.

The 7 primitive operators required are quote, atom, eq, car, cdr, cons and cond. You can look up their definitions in Paul Graham's complete article. The standard format of writing these functions in lisp is as follows,
  1. (quote a1)
  2. (atom a1)
  3. (eq a1 a2)
  4. (car a1)
  5. (cdr a1)
  6. (cons a1 a2)
  7. (cond (p1 e1) (p2 e2) ... (pn en))
The above functions may be implemented in any language, C or assembly or binary. It just does not matter. What does matter is that they should be callable in the above format.

Auxilliary Functions

The auxillary functions given below are used by our lisp interpreter. Notice that they require only the seven functions mentioned above. However, we have this problem of the "chicken and the egg" when defining new functions. In order to define new functions you need to have an interpreter. But, in order to create the interpreter we need to define some functions. I do not intend to discuss how to resolve this problem right now but if you are impatient please have a look at an excellent discussion on this topic at comp.lang.lisp and this discussion.

I have shown how the functions cxxxr may be implemented in addition to the other functions described in "The roots of lisp".

    (defun caaar (a) (car (caar a)))
    (defun caadr (a) (car (cadr a)))
    (defun cadar (a) (car (cdar a)))
    (defun caddr (a) (car (cddr a)))
    (defun cdaar (a) (cdr (caar a)))
    (defun cdadr (a) (cdr (cadr a)))
    (defun cddar (a) (cdr (cdar a)))
    (defun cdddr (a) (cdr (cddr a)))

    (defun caddar (a) (car (cddar a)))

    (defun null. (x)
        (eq x '()))

    (defun and. (x y)
      (cond (x (cond (y 't) ('t '())))
            ('t '())))

    (defun not. (x)
      (cond (x '())
            ('t 't)))

     (defun append. (x y)
      (cond ((null. x) y)
            ('t (cons (car x) (append. (cdr x) y)))))

    (defun pair. (x y)
      (cond ((and. (null. x) (null. y)) '())
            ((and. (not. (atom x)) (not. (atom y)))
                   (cons (list (car x) (car y))
                         (pair. (cdr x) (cdr y))))))

    (defun assoc. (x y)
      (cond ((eq (caar y) x) (cadar y))
            ('t  (assoc. x (cdr y)))))

    (defun evcon. (c a)
      (cond ((eval. (caar c)  a)
             (eval. (cadar c) a))
            ('t (evcon. (cdr c) a))))

    (defun evlis. (m a)
      (cond ((null. m) '())
            ('t (cons (eval.  (car m) a)
                      (evlis. (cdr m) a)))))


Finally, the interpreter

In order to explain its workings clearly I will be splitting the explanation into two posts. In this post, I discuss how the seven primitive functions are evaluated. In the next post, I will describe the syntax and semantics of the label and lambda functions. I also intend to explain and illustrate how the interpreter handles the label form, the lambda form as well as user-defined functions in the next post.

The complete code for the interpreter is as given below.

    (defun eval. (e a)
      (cond
        ((atom e) (assoc. e a))
        ((atom (car e))
             (cond
                 ((eq (car e) 'quote)  (cadr e))
                 ((eq (car e) 'atom)   (atom (eval. (cadr e) a)))
                 ((eq (car e) 'eq)     (eq   (eval. (cadr e) a) (eval. (caddr e) a)))
                 ((eq (car e) 'car)    (car  (eval. (cadr e) a)))
                 ((eq (car e) 'cdr)    (cdr  (eval. (cadr e) a)))
                 ((eq (car e) 'cons)   (cons (eval. (cadr e) a) (eval. (caddr e) a)))
                 ((eq (car e) 'cond)   (evcon. (cdr e) a))
                 ('t  (eval. (cons (assoc. (car e) a) (cdr e)) a))))
        ((eq (caar e) 'label)
             (eval. (cons (caddar e) (cdr e)) (cons (list (cadar e) (car e)) a)))
        ((eq (caar e) 'lambda)
             (eval. (caddar e) (append. (pair. (cadar e) (evlis. (cdr e) a)) a)))))

Note:- The eval. function takes as input two arguments 'e' and 'a'. The first argument is the expression to be evaluated while the second argument is the environment in which it is to be evaluated. To be more precise, the second argument is really an association list consisting of pairs of variables and their values.

Thus, the outermost skeletal structure of the interpreter is as follows,

    (defun eval. (e a)
      (cond
        ((atom e) (assoc. e a))
        ((atom (car e)) (expr-atom))
        ((eq (caar e) 'label)  (expr-label))   ;; For the label special form.
        ((eq (caar e) 'lambda) (expr-lmbda))   ;; For lambda special forms 
      )                                        ;; End of the outer cond form.
    )                                          ;; End of function definition.     


As you can see the outermost structure of eval. consists of a single cond form containing four conditions. In this post I describe and explian only the first two conditions. The other two will be explained in a later post.
  • The first condition checks if the expression is an atom. If it is, then the expression is just a variable and thus eval. simply returns the value bound to the variable in the environment/association list 'a' using the assoc. function.
  • If the expression 'e' is not an atom, then, it checks if the 'car' of expression is an atom. The car(first element) of an expression is an atom precisely when it is either a primitive function or any other user-defined function. If this condition is true then the interpreter evaluates "expr-atom" that has been represented in italics in the code above. It is a placeholder for the actual code.
  • The third condition checks if the expression 'e' is a label form.
  • The final condition check is for the lambda form.


Now, let me give you a small example to familiarize you with the way in which the interpreter works. Please note that all the functions are being evaluated inside emacs.

    (eval. 'x '((x  "Hello World!")))   ==> "Hello World!"


;; or another way to obtain same results

    (setq  e  'x)
    (setq  a  '((x "Hello World!")))
    (eval. e   a)    ==>    "Hello World!"
 

In the above code the expression was set to 'x and the environment of the expression was set to the association list '((x "Hello World")). In the first piece of code it was set implicitly. However, in the last three lines the settings were provided explicitly and the function call was made as (eval. e a) . The reason for providing settings explicitly is that it really helps in breaking up the code to get a clearer insight into how the code really works.

    (setq  e  'x)
    (setq  a  '((x "Hello World!")))

    (atom e)      ==>  t        
    (assoc. e a)  ==>  "Hello World!"
 
Thus we can clearly observe that the assoc. function is used in eval. to look up the values of the variables.

Now, let's have a look at the code for expr-atom when the first element of the expression is an atom.

    (defun eval. (e a)
      (cond
        ((atom e) (assoc. e a))
        ((atom (car e))
                 (cond
                     ((eq (car e) 'quote)  (cadr e))
                     ((eq (car e) 'atom)   (atom (eval. (cadr e) a)))
                     ((eq (car e) 'eq)     (eq   (eval. (cadr e) a) (eval. (caddr e) a)))
                     ((eq (car e) 'car)    (car  (eval. (cadr e) a)))
                     ((eq (car e) 'cdr)    (cdr  (eval. (cadr e) a)))
                     ((eq (car e) 'cons)   (cons (eval. (cadr e) a) (eval. (caddr e) a)))
                     ((eq (car e) 'cond)   (evcon. (cdr e) a))
                     ('t  (eval. (cons (assoc. (car e) a) (cdr e)) a))
                 )       ;; End of inner cond form and expr-atom
         )               ;; End of the second check of outer cond 
        (third-condition)       ;; The label check 
        (fourth-condition)      ;; The check for the lambda form 
       )
    )
    

Now, for the remainder of this post I will be talking about expressions of the form
 e = (a1 a2 ... an)
where the first argument of the expression, a1 = (car e), is an atom. Let's see each case one by one, with examples.

  1. quote
        (setq e '(quote Works!!!)) 
        (setq a '())    ;;We don't need an environment for quote 
        (eval. e a)     ==> Works!!!
        
         ;; Works!!! But, how?    
        (atom e)        ==> nil   ;;The first condition (in outermost cond) fails 
        (atom (car e))  ==> t     ;;The second condition (in outermost cond) succeeds 
        
        (car e)                ==> quote
        (eq (car e) 'quote)    ==> t
        (cadr e)               ==> Works!!! ;;The value of the expression (cadr e) is returned 
    
  2. atom
        (setq e  '(atom x))
        (setq a '((x 1)))
        (atom (car e))     ==> t    ;; Is atom an atom? Yes, it is !
        (eval. e a)        ==> t    ;; And x is an atom too
    
        ;; How?
        ;;Since expression for atom is (atom (eval. (cadr e) a))
        (cadr e)                   ==> x
        (eval. (cadr e) a)         ==> 1
        (atom (eval. (cadr e) a))  ==> (atom 1)    ==> t
    
  3. eq
        (setq e '(eq x y))
        (eval. e '((x 1) (y 1)))   ==> t
        (eval. e '((x 1) (y 2)))   ==> nil
    
        ;; Expression for eq is (eq   (eval. (cadr e) a) (eval. (caddr e) a))
        (cadr e)      ==> x
        (caddr e)     ==> y
        (setq expr1 (eval. (cadr e) '((x 1) (y 1))))  ==> 1
        (setq expr2 (eval. (caddr e) '((x 1) (y 1)))) ==> 1 
    
        ;;Thus,
        (eq  expr1 expr2)  ==> t
    
        ;;While,
        (setq expr3 (eval. (cadr e) '((x 1) (y 2))))  ==> 1
        (setq expr4 (eval. (caddr e) '((x 1) (y 2)))) ==> 2
    
        ;;And hence,
        (eq  expr3 expr4)  ==> nil
    
    
  4. car

    Notice the recursive call to eval. where the remainder of the expression is evaluated in the environment 'a' and the car of the resulting list is returned.
        (car  (eval. (cadr e) a))
    
  5. cdr

    Similar to car but the value returned is the rest of the list excluding the first element.
        (cdr  (eval. (cadr e) a))
    
  6. cons

    The first input to cons is evaluated followed by the evaluation of the second input and then both are consed together.
        (cons (eval. (cadr e) a) (eval. (caddr e) a))
    
  7. cond

    Let's start with an example,
        (setq e '(cond ((atom x) (quote x-is-an-atom))
                       ((atom y) (quote y-is-an-atom))))
        (setq a '((x (1 2 3))
                  (y 1)))
    
        (eval. e a)     ==> y-is-an-atom
    

    Looking at the code for eval. , the result returned by a cond form is the result of evaluating
    (evcon. (cdr e) a)
    
    ;; The input for evcon. is (cdr e) and the environment 'a'
    (cdr e) ==> (((atom x) (quote x-is-an-atom)) ((atom y) (quote y-is-an-atom)))
    
    ;; Thus for the cond form, (cdr e) is the association list ((p1 e1 ) (p2 e2) ... (pn en))
    

    Thus the evcon. function takes as input (cdr e) which is an association list of the form, ((p1 e1 ) (p2 e2) ... (pn en)), where each of the pi's denote a condition or predicate and for any i, the expression ei, is the expression whose result is to be returned when the corresponding pi evaluates to true.

    Let's have a closer look at evcon. ,
        (defun evcon. (c a)
          (cond ((eval. (caar c)  a) (eval. (cadar c) a))
                ('t (evcon. (cdr c) a))))
    

    All the evcon. function does is evaluate the first element in the head of the list namely the first condition and if it happens to evaluate to true then it evaluates the expression associated with that condition. If, however the conditions returns nil then the function recursively calls evcon. this time with the cdr of the current cond list.


This completes the explanation of the first half of the interpreter in which I explained how the interpreter handles the seven functions that were assumed as primitive for the functioning of the interpreter itself.

0 comments:

Post a Comment