Comment by danilor
Comment by danilor a day ago
I appreciate this as someone learning lisp for the first time!
While I'm here I'll ask: does anybody have a recommendation for resources on learning lisp that focus on how the interpreter/compiler works? Here's what I tried so far: 1) Started Practical Common Lisp, but I found it lacking in the specifics. Then I 2) tried going to the specification but that was way too much. Then I 3) tried Common Lisp The Language and, while I liked the precise definitions a lot, it was a lot to keep in mind at the same time as learning the fundamentals. Right now I'm 4) back at Practical Common Lisp, skipping most of the text and reading surrounding paragraphs whenever if I find a piece of code I don't understand.
I wanted something that will not explain the standard library; I'd rather read the documentation for that. I wanted to know more precisely what happens throughout the program's life: Like, is behavior when running a script in "interpreter" mode guaranteed to be the same as when running in "native compiled code"? At what point does the compiler create an intermediate representation? Why does SBCL not use LLVM? At what point are #-starting words(read macros?) evaluated and how is that different from interpreting to compiling. How do I control the amount of optimization the compiler will do? Will the interpreter ever try to optimize? Is garbage collection implementation-defined? How will implementations change in behavior?
The essence of Common Lisp (and other Lisps) is the metacircular evaluator from John McCarthy's Lisp 1.5 Manual:
https://michaelnielsen.org/ddi/wp-content/uploads/2012/04/Li...
This is the half-page of code that Alan Kay famously described as "the Maxwell's equations of software."
Common Lisp and MACLISP before it and Lisp 1.5 before that worked by providing a running program that implements the functions given in that half page and applies them and their effects iteratively to their own environment.
In other words, the basic model is that Lisp is a running program that you modify interactively by feeding expressions to it one after another.
Common Lisp's model still works like that: when you give it a source file it reads one expression after another from the file, evaluating the expressions as it goes, and altering the running environment to reflect any changes that they make, such as adding new definitions, updating data structures, and so on.
The Lisp reads the next well-formed expression, converting it into Lisp data structures. It then evaluates the resulting data structure, yielding some number of values and updating the environment at the same time. (In Common Lisp even pure functions have side effects if evaluated by the read-eval-print loop because the standard defines a globally-accessible variable named * that contains the value returned by the most recently-evaluated expression).
Common Lisp's design reflects this basic model, and much of its standard library is concerned with making it convenient to work this way. The ANSI standard likewise reflects this basic model of computation, including features specifically designed to support this style of interactive programming.
The process of evaluation allows, but does not require, compilation. Common Lisp interpreters (usually called "evaluators") do exist, but most practical implementations provide compilers that compile each expression in order to evaluate it. Some implementations are "compiler only": that is, every expression is always compiled in order to be evaluated, whether it's in a source file or typed at a repl prompt.
To answer some of your specific questions more directly:
> is behavior when running a script in "interpreter" mode guaranteed to > be the same as when running in "native compiled code"?
"Guaranteed" is a strong word. ANSI specifies the behavior, and it's generally the same whether code is interpreted or compiled. That may be hard to achieve for some expressions, but if you find an example of different behavior, that's most likely a bug in ANSI compliance.
(I should allow for the possibility that the ANSI spec has some corner somewhere that allows for a difference, but I can't think of any off the top of my head. Of course, the spec is large and my memory is imperfect.)
> At what point does the compiler create an intermediate representation?
Depends on what you mean. Under some interpretations and in some implementations there may not be an intermediate form.
The normal lifecycle of an evaluation is:
- read: ingest text and convert it to Lisp data
- macroexpand: rewrite any macro expressions in the data
- eval: use eval and apply on the data to yield some result values
- print: convert the result values to text in a standard format and print it to *standard-output*
You might regard the result of the read function as an intermediate form, but I would say that what read produces is Lisp source code. In my view, the original text is not Lisp source code; it's a serialization of Lisp source code. Read converts the text into Lisp source code.
ANSI does not specify in detail exactly what eval has to do in order to yield results. It specifies abstractly what each function, macro, and special form is supposed to produce, and what side effects they are supposed to have on the running environment, but the details of how the code is generated and executed are up to the implementation. An implementation can simply walk the source code interpreting forms and producing the required effects; or it can convert them to some intermediate form such as bytecode and then let a byecode interpreter do the evaluating; or it can compile them directly to native code that does the evaluating when executed. These are implementation details not specified by ANSI.
One detail that is specified by ANSI is the expansion of macros. Macros are expressions that look like function calls or special forms, but which are implemented by applying a macro-expander function to the expression to produce a new expression that is then given to the evaluator. Macros may be defined by the system or by the user. They are expanded after read and before eval. The spec goes into some detail about how this process is supposed to work and what features of the language affect it.
> Why does SBCL not use LLVM?
LLVM's first release was 2003. SBCL's first release was 1999. Moreover, SBCL is a fork of CMUCL, which was first released in 1980, though that was before Common Lisp existed. It was called SPICE Lisp at that time, and the compiler rewrite that would turn it into Common Lisp happened about five years later.
The point is that one important reason that SBCL doesn't use LLVM is that it's a large mature codebase that predates the existence of LLVM by about 23 years. I'm not saying it would be impossible to port SBCL onto LLVM, but if you did it would be so much changed that it wouldn't really be SBCL anymore. It would probably also be easier to just write a new Common Lisp on LLVM (which is what the CLASP project has done).
> At what point are #-starting words(read macros?) evaluated and > how is that different from interpreting to compiling.
An expression that starts with "#" is, as you alluded to, a reader macro. What that means is that when the read function starts to read the expression it encounters the "#" and that triggers it to look up the next character in a table of special read functions. If it doesn't find anything it signals a READER-ERROR. If it does find something then it applies the found function to the input to read it.
For example, if you feed it "#'+" then it looks up the quote character in the table of reader macros and finds a function that converts the expression to "(function +)". When "(function +)" is evaluated it returns the function that is globally bound to the symbol named "+".
So the sequence is:
The details of what happens when a reader macro is executed depend on the reader macro bound to the specific dispatch character. A bunch of them are defined by the ANSI standard, but you're free to define your own, and it's up to you what those do. You just have to know that they will get called when the reader encounters "#" followed by whatever dispatch character you specify, and the output of the function will get passed to EVAL after you're done with it.> How do I control the amount of optimization the compiler will do?
With the declaration OPTIMIZE: http://clhs.lisp.se/Body/d_optimi.htm#optimize
For example, (declare (optimize (speed 3)(safety 0)))
> Will the interpreter ever try to optimize?
The answer is implementation specific, and may not even make sense (for example if you are using a compiler-only implementation).
> Is garbage collection implementation-defined?
To a large extent, yes. The spec assumes that implementations provide automatic memory management, but I think the only directly-related feature specified by ANSI is the ROOM function. Every implementation I know of also has the GC function to trigger a collection, but I don't think it's in the spec. Different implementations also have different GC strategies, they may have more than one GC implementation, and they provide various different implementation-specific tools for inspecting and controlling GC behavior.