Comment by mikelevins

The essence of Common Lisp (and other Lisps) is the metacircular evaluator from John McCarthy's Lisp 1.5 Manual:

https://michaelnielsen.org/ddi/wp-content/uploads/2012/04/Li...

This is the half-page of code that Alan Kay famously described as "the Maxwell's equations of software."

Common Lisp and MACLISP before it and Lisp 1.5 before that worked by providing a running program that implements the functions given in that half page and applies them and their effects iteratively to their own environment.

In other words, the basic model is that Lisp is a running program that you modify interactively by feeding expressions to it one after another.

Common Lisp's model still works like that: when you give it a source file it reads one expression after another from the file, evaluating the expressions as it goes, and altering the running environment to reflect any changes that they make, such as adding new definitions, updating data structures, and so on.

The Lisp reads the next well-formed expression, converting it into Lisp data structures. It then evaluates the resulting data structure, yielding some number of values and updating the environment at the same time. (In Common Lisp even pure functions have side effects if evaluated by the read-eval-print loop because the standard defines a globally-accessible variable named * that contains the value returned by the most recently-evaluated expression).

Common Lisp's design reflects this basic model, and much of its standard library is concerned with making it convenient to work this way. The ANSI standard likewise reflects this basic model of computation, including features specifically designed to support this style of interactive programming.

The process of evaluation allows, but does not require, compilation. Common Lisp interpreters (usually called "evaluators") do exist, but most practical implementations provide compilers that compile each expression in order to evaluate it. Some implementations are "compiler only": that is, every expression is always compiled in order to be evaluated, whether it's in a source file or typed at a repl prompt.

To answer some of your specific questions more directly:

> is behavior when running a script in "interpreter" mode guaranteed to > be the same as when running in "native compiled code"?

"Guaranteed" is a strong word. ANSI specifies the behavior, and it's generally the same whether code is interpreted or compiled. That may be hard to achieve for some expressions, but if you find an example of different behavior, that's most likely a bug in ANSI compliance.

(I should allow for the possibility that the ANSI spec has some corner somewhere that allows for a difference, but I can't think of any off the top of my head. Of course, the spec is large and my memory is imperfect.)

> At what point does the compiler create an intermediate representation?

Depends on what you mean. Under some interpretations and in some implementations there may not be an intermediate form.

The normal lifecycle of an evaluation is:

- read: ingest text and convert it to Lisp data

- macroexpand: rewrite any macro expressions in the data

- eval: use eval and apply on the data to yield some result values

- print: convert the result values to text in a standard format and print it to *standard-output*

You might regard the result of the read function as an intermediate form, but I would say that what read produces is Lisp source code. In my view, the original text is not Lisp source code; it's a serialization of Lisp source code. Read converts the text into Lisp source code.

ANSI does not specify in detail exactly what eval has to do in order to yield results. It specifies abstractly what each function, macro, and special form is supposed to produce, and what side effects they are supposed to have on the running environment, but the details of how the code is generated and executed are up to the implementation. An implementation can simply walk the source code interpreting forms and producing the required effects; or it can convert them to some intermediate form such as bytecode and then let a byecode interpreter do the evaluating; or it can compile them directly to native code that does the evaluating when executed. These are implementation details not specified by ANSI.

One detail that is specified by ANSI is the expansion of macros. Macros are expressions that look like function calls or special forms, but which are implemented by applying a macro-expander function to the expression to produce a new expression that is then given to the evaluator. Macros may be defined by the system or by the user. They are expanded after read and before eval. The spec goes into some detail about how this process is supposed to work and what features of the language affect it.

> Why does SBCL not use LLVM?

LLVM's first release was 2003. SBCL's first release was 1999. Moreover, SBCL is a fork of CMUCL, which was first released in 1980, though that was before Common Lisp existed. It was called SPICE Lisp at that time, and the compiler rewrite that would turn it into Common Lisp happened about five years later.

The point is that one important reason that SBCL doesn't use LLVM is that it's a large mature codebase that predates the existence of LLVM by about 23 years. I'm not saying it would be impossible to port SBCL onto LLVM, but if you did it would be so much changed that it wouldn't really be SBCL anymore. It would probably also be easier to just write a new Common Lisp on LLVM (which is what the CLASP project has done).

> At what point are #-starting words(read macros?) evaluated and > how is that different from interpreting to compiling.

An expression that starts with "#" is, as you alluded to, a reader macro. What that means is that when the read function starts to read the expression it encounters the "#" and that triggers it to look up the next character in a table of special read functions. If it doesn't find anything it signals a READER-ERROR. If it does find something then it applies the found function to the input to read it.

For example, if you feed it "#'+" then it looks up the quote character in the table of reader macros and finds a function that converts the expression to "(function +)". When "(function +)" is evaluated it returns the function that is globally bound to the symbol named "+".

So the sequence is:

  "#'+" -> READ -> reader macro -> EVAL (function +) -> return the function bound to +

The details of what happens when a reader macro is executed depend on the reader macro bound to the specific dispatch character. A bunch of them are defined by the ANSI standard, but you're free to define your own, and it's up to you what those do. You just have to know that they will get called when the reader encounters "#" followed by whatever dispatch character you specify, and the output of the function will get passed to EVAL after you're done with it.

> How do I control the amount of optimization the compiler will do?

With the declaration OPTIMIZE: http://clhs.lisp.se/Body/d_optimi.htm#optimize

For example, (declare (optimize (speed 3)(safety 0)))

> Will the interpreter ever try to optimize?

The answer is implementation specific, and may not even make sense (for example if you are using a compiler-only implementation).

> Is garbage collection implementation-defined?

To a large extent, yes. The spec assumes that implementations provide automatic memory management, but I think the only directly-related feature specified by ANSI is the ROOM function. Every implementation I know of also has the GC function to trigger a collection, but I don't think it's in the spec. Different implementations also have different GC strategies, they may have more than one GC implementation, and they provide various different implementation-specific tools for inspecting and controlling GC behavior.

(Not answerer) Slight clarification, because I had the same questions and the previous answer is already really good:

Rainer Joswig pointed out on StackOverflow that the evaluation order is as follows (even though CLHS seems non-explicit about this):

For a compiler, for every "lisp form" (see below):

1. Read time

2. Macroexpansion time

3. Compile time

4. Load time

5 Run time

Read time maps strings to symbolic expressions (lists and atoms). A Common Lisp form is a combination of lists and atoms that is valid Common Lisp, for example

  (+ 1 2)

  (format t "Hello World")

Note, that the following is a list but not valid code, a.k.a. a form:

  ("abc" 2 thingamagig)

Macroexpansion is when user-defined forms are turned into Lisp forms. This is how WITH-OPEN-FILE can be defined by a user or non-language-implementor. It cannot be a function, because it doesn’t really evaluate its first argument, the stream variable name (a symbol) and the arguments to open. Furthermore, it also cleans up the file handle with UNWIND-PROTECT.

Special forms and macros are "unlike function calls". The difference is, that special forms are provided by the language and macros can be provided by the user. The conditional "if" is a special form for this reason. Haskell sidesteps this particular problem by making "if" (and others) evaluate lazily. In lisp you can get a similar effect by wrapping stuff in lambdas with zero arguments as something called a thunk and funcalling that later.

Compile time takes lisp forms and turns them into "machine code" whatever that happens to be.

Compile time and load time are more easily understood when you look at parenscript. Valid parenscript forms are not always valid lisp forms and vice versa. But they are both symbolic expressions, so basically lisp lists of lists and atoms. Parenscript compiles to JavaScript strings. Those get loaded by the browser when it loads the HTML or JS page that contains the JavaScript strings. Then the browser runs that code.

Look at the differences between PS:defpsmacro and CL:defmacro and PS:defmacro+ps .

In Common Lisp:

Read macros are executed at read time. Macros are evaluated at macroexpansion time. Compiler macros are evaluated at compile time. Load-time-value is evaluated at load time. Funcall, apply, lambda, and technically eval are evaluated at run time.

The slight wrinkle is that eval applies all if these rules recursively. So LIST is a lisp run-time function, but you can also call it to create the output in a DEFMACRO form. All run-time functions already defined can be used for all later forms (and strings) when they are being processed. Doug Hoyte called this "run-time at read-time" [1]. And by extensions it is also "run-time at macroexpansion time" and "run-time at compile time" and "run-time at load time" and "run-time at run time". (This last one is functions as first-class values and having access to a compiler and EVAL.)

LOAD-TIME-VALUE is executed at load time. Technically, the following top-level form is, too:

  (Eval-when (:load-toplevel) …)

For ideas on read macros look at Doug Hoyte's [1] SHARP-DOUBLE-QUOTE #" and SHARP-GREATER-THAN #> .

[1] https://letoverlambda.com/index.cl/guest/chap4.html