Comment by danilor

Comment by danilor a day ago

22 replies

I appreciate this as someone learning lisp for the first time!

While I'm here I'll ask: does anybody have a recommendation for resources on learning lisp that focus on how the interpreter/compiler works? Here's what I tried so far: 1) Started Practical Common Lisp, but I found it lacking in the specifics. Then I 2) tried going to the specification but that was way too much. Then I 3) tried Common Lisp The Language and, while I liked the precise definitions a lot, it was a lot to keep in mind at the same time as learning the fundamentals. Right now I'm 4) back at Practical Common Lisp, skipping most of the text and reading surrounding paragraphs whenever if I find a piece of code I don't understand.

I wanted something that will not explain the standard library; I'd rather read the documentation for that. I wanted to know more precisely what happens throughout the program's life: Like, is behavior when running a script in "interpreter" mode guaranteed to be the same as when running in "native compiled code"? At what point does the compiler create an intermediate representation? Why does SBCL not use LLVM? At what point are #-starting words(read macros?) evaluated and how is that different from interpreting to compiling. How do I control the amount of optimization the compiler will do? Will the interpreter ever try to optimize? Is garbage collection implementation-defined? How will implementations change in behavior?

mikelevins a day ago

The essence of Common Lisp (and other Lisps) is the metacircular evaluator from John McCarthy's Lisp 1.5 Manual:

https://michaelnielsen.org/ddi/wp-content/uploads/2012/04/Li...

This is the half-page of code that Alan Kay famously described as "the Maxwell's equations of software."

Common Lisp and MACLISP before it and Lisp 1.5 before that worked by providing a running program that implements the functions given in that half page and applies them and their effects iteratively to their own environment.

In other words, the basic model is that Lisp is a running program that you modify interactively by feeding expressions to it one after another.

Common Lisp's model still works like that: when you give it a source file it reads one expression after another from the file, evaluating the expressions as it goes, and altering the running environment to reflect any changes that they make, such as adding new definitions, updating data structures, and so on.

The Lisp reads the next well-formed expression, converting it into Lisp data structures. It then evaluates the resulting data structure, yielding some number of values and updating the environment at the same time. (In Common Lisp even pure functions have side effects if evaluated by the read-eval-print loop because the standard defines a globally-accessible variable named * that contains the value returned by the most recently-evaluated expression).

Common Lisp's design reflects this basic model, and much of its standard library is concerned with making it convenient to work this way. The ANSI standard likewise reflects this basic model of computation, including features specifically designed to support this style of interactive programming.

The process of evaluation allows, but does not require, compilation. Common Lisp interpreters (usually called "evaluators") do exist, but most practical implementations provide compilers that compile each expression in order to evaluate it. Some implementations are "compiler only": that is, every expression is always compiled in order to be evaluated, whether it's in a source file or typed at a repl prompt.

To answer some of your specific questions more directly:

> is behavior when running a script in "interpreter" mode guaranteed to > be the same as when running in "native compiled code"?

"Guaranteed" is a strong word. ANSI specifies the behavior, and it's generally the same whether code is interpreted or compiled. That may be hard to achieve for some expressions, but if you find an example of different behavior, that's most likely a bug in ANSI compliance.

(I should allow for the possibility that the ANSI spec has some corner somewhere that allows for a difference, but I can't think of any off the top of my head. Of course, the spec is large and my memory is imperfect.)

> At what point does the compiler create an intermediate representation?

Depends on what you mean. Under some interpretations and in some implementations there may not be an intermediate form.

The normal lifecycle of an evaluation is:

- read: ingest text and convert it to Lisp data

- macroexpand: rewrite any macro expressions in the data

- eval: use eval and apply on the data to yield some result values

- print: convert the result values to text in a standard format and print it to *standard-output*

You might regard the result of the read function as an intermediate form, but I would say that what read produces is Lisp source code. In my view, the original text is not Lisp source code; it's a serialization of Lisp source code. Read converts the text into Lisp source code.

ANSI does not specify in detail exactly what eval has to do in order to yield results. It specifies abstractly what each function, macro, and special form is supposed to produce, and what side effects they are supposed to have on the running environment, but the details of how the code is generated and executed are up to the implementation. An implementation can simply walk the source code interpreting forms and producing the required effects; or it can convert them to some intermediate form such as bytecode and then let a byecode interpreter do the evaluating; or it can compile them directly to native code that does the evaluating when executed. These are implementation details not specified by ANSI.

One detail that is specified by ANSI is the expansion of macros. Macros are expressions that look like function calls or special forms, but which are implemented by applying a macro-expander function to the expression to produce a new expression that is then given to the evaluator. Macros may be defined by the system or by the user. They are expanded after read and before eval. The spec goes into some detail about how this process is supposed to work and what features of the language affect it.

> Why does SBCL not use LLVM?

LLVM's first release was 2003. SBCL's first release was 1999. Moreover, SBCL is a fork of CMUCL, which was first released in 1980, though that was before Common Lisp existed. It was called SPICE Lisp at that time, and the compiler rewrite that would turn it into Common Lisp happened about five years later.

The point is that one important reason that SBCL doesn't use LLVM is that it's a large mature codebase that predates the existence of LLVM by about 23 years. I'm not saying it would be impossible to port SBCL onto LLVM, but if you did it would be so much changed that it wouldn't really be SBCL anymore. It would probably also be easier to just write a new Common Lisp on LLVM (which is what the CLASP project has done).

> At what point are #-starting words(read macros?) evaluated and > how is that different from interpreting to compiling.

An expression that starts with "#" is, as you alluded to, a reader macro. What that means is that when the read function starts to read the expression it encounters the "#" and that triggers it to look up the next character in a table of special read functions. If it doesn't find anything it signals a READER-ERROR. If it does find something then it applies the found function to the input to read it.

For example, if you feed it "#'+" then it looks up the quote character in the table of reader macros and finds a function that converts the expression to "(function +)". When "(function +)" is evaluated it returns the function that is globally bound to the symbol named "+".

So the sequence is:

  "#'+" -> READ -> reader macro -> EVAL (function +) -> return the function bound to +
The details of what happens when a reader macro is executed depend on the reader macro bound to the specific dispatch character. A bunch of them are defined by the ANSI standard, but you're free to define your own, and it's up to you what those do. You just have to know that they will get called when the reader encounters "#" followed by whatever dispatch character you specify, and the output of the function will get passed to EVAL after you're done with it.

> How do I control the amount of optimization the compiler will do?

With the declaration OPTIMIZE: http://clhs.lisp.se/Body/d_optimi.htm#optimize

For example, (declare (optimize (speed 3)(safety 0)))

> Will the interpreter ever try to optimize?

The answer is implementation specific, and may not even make sense (for example if you are using a compiler-only implementation).

> Is garbage collection implementation-defined?

To a large extent, yes. The spec assumes that implementations provide automatic memory management, but I think the only directly-related feature specified by ANSI is the ROOM function. Every implementation I know of also has the GC function to trigger a collection, but I don't think it's in the spec. Different implementations also have different GC strategies, they may have more than one GC implementation, and they provide various different implementation-specific tools for inspecting and controlling GC behavior.

  • dapperdrake a day ago

    (Not answerer) Slight clarification, because I had the same questions and the previous answer is already really good:

    Rainer Joswig pointed out on StackOverflow that the evaluation order is as follows (even though CLHS seems non-explicit about this):

    For a compiler, for every "lisp form" (see below):

    1. Read time

    2. Macroexpansion time

    3. Compile time

    4. Load time

    5 Run time

    Read time maps strings to symbolic expressions (lists and atoms). A Common Lisp form is a combination of lists and atoms that is valid Common Lisp, for example

      (+ 1 2)
    
    or

      (format t "Hello World")
    
    
    Note, that the following is a list but not valid code, a.k.a. a form:

      ("abc" 2 thingamagig)
    
    Macroexpansion is when user-defined forms are turned into Lisp forms. This is how WITH-OPEN-FILE can be defined by a user or non-language-implementor. It cannot be a function, because it doesn’t really evaluate its first argument, the stream variable name (a symbol) and the arguments to open. Furthermore, it also cleans up the file handle with UNWIND-PROTECT.

    Special forms and macros are "unlike function calls". The difference is, that special forms are provided by the language and macros can be provided by the user. The conditional "if" is a special form for this reason. Haskell sidesteps this particular problem by making "if" (and others) evaluate lazily. In lisp you can get a similar effect by wrapping stuff in lambdas with zero arguments as something called a thunk and funcalling that later.

    Compile time takes lisp forms and turns them into "machine code" whatever that happens to be.

    Compile time and load time are more easily understood when you look at parenscript. Valid parenscript forms are not always valid lisp forms and vice versa. But they are both symbolic expressions, so basically lisp lists of lists and atoms. Parenscript compiles to JavaScript strings. Those get loaded by the browser when it loads the HTML or JS page that contains the JavaScript strings. Then the browser runs that code.

    Look at the differences between PS:defpsmacro and CL:defmacro and PS:defmacro+ps .

    In Common Lisp:

    Read macros are executed at read time. Macros are evaluated at macroexpansion time. Compiler macros are evaluated at compile time. Load-time-value is evaluated at load time. Funcall, apply, lambda, and technically eval are evaluated at run time.

    The slight wrinkle is that eval applies all if these rules recursively. So LIST is a lisp run-time function, but you can also call it to create the output in a DEFMACRO form. All run-time functions already defined can be used for all later forms (and strings) when they are being processed. Doug Hoyte called this "run-time at read-time" [1]. And by extensions it is also "run-time at macroexpansion time" and "run-time at compile time" and "run-time at load time" and "run-time at run time". (This last one is functions as first-class values and having access to a compiler and EVAL.)

    LOAD-TIME-VALUE is executed at load time. Technically, the following top-level form is, too:

      (Eval-when (:load-toplevel) …)
    
    For ideas on read macros look at Doug Hoyte's [1] SHARP-DOUBLE-QUOTE #" and SHARP-GREATER-THAN #> .

    [1] https://letoverlambda.com/index.cl/guest/chap4.html

stassats 22 minutes ago

> Why does SBCL not use LLVM?

Why doesn't GCC use LLVM? Why is Zig writing their own backend?

drob518 a day ago

I suggest “Lisp in Small Pieces” as a great introduction to implementing a Lisp. It focuses on Scheme rather than CL, but all of the knowledge is transferable. In the book, you go through multiple basic implementations, each successively more sophisticated, starting with basic interpretation and moving through more optimizations toward a compiler.

  • cryptonector 21 hours ago

    I concur. LiSP is one of my favorite books in my library. I highly recommend it.

kevindamm a day ago

My first suggestion would be Norvig's text "Artificial Intelligence Programming" (note: not Russel and Norvig's "Modern Approach" but the "Case Studies" one with the teal and orange cover), start with chapter 3 for a straightforward description of the core set of routines (more than a minimal impl but not everything you'd see described in CL) and choose a few from any of the other chapters as a demo program that you can use as a first target for your implementation. Yeah it's GOFAI and might feel dated but it worked well for me to get a stronger grasp of Lisp implementations.

This won't hold your hand through implementing Lisp within another language like C or Rust, but it does show the representation of Lisp within Lisp (and a few chapters on implementing a logic programming language in Lisp as well). Most of the task of transpiling into another language reduces to how you represent functions and function results, and the extent to which your compiler reduces specific built-ins. There's a range of implementation here that you can find covered in PL research papers, but you may be better off looking through compiler books and articles for that.

This book also has a couple chapters on efficiency concerns for implementations of Lisp, chapters 9 and 10, which look at both dev-side and compiler-side approaches to more efficient Lisps.

https://www.goodreads.com/book/show/15534706

Despite the focus on GOFAI, I think the lessons apply to good Lisp programs in general from a practitioner's PoV. If you want a more abstract vantage then I'd say the classic "Structure and Interpretation of Computer Programs" by Abelson & Sussman should be mentioned. It includes a lot of diagrams of CONS structures to keep things focused but it doesn't have the kind of full case study applications of the Norvig text.

https://www.goodreads.com/book/show/43713

It's funny how all the good books on Lisp implementations assume that first you start with a Lisp. I wonder if this goes back to there being Lisp machines on which the language was already deeply situated.

bigdict a day ago

Lisp In Small Pieces

guenthert 20 hours ago

> Like, is behavior when running a script in "interpreter" mode guaranteed to be the same as when running in "native compiled code"?

The ANSI Common Lisp standard applies to all compliant implementation, regardless of whether they interpret or compile the language. In earlier Lisps there might have been a difference, in Common Lisp that would be a bug.

> At what point does the compiler create an intermediate representation?

??? Are you referring to macro-expansion time?

> Why does SBCL not use LLVM?

Other than for historic reasons, why would it?

> At what point are #-starting words(read macros?) evaluated and how is that different from interpreting to compiling

You're answering your own question. Read macros are evaluated at read time.

I'm not sure whether those questions are really all that relevant to someone just starting to learn the language.

Practical Common Lisp is a fine starting point and Common Lisp The Language (2nd edition) will keep you busy for a looong time.

zeraholladay 18 hours ago

I am in a similar situation as you, but in a lot of ways you're asking more sophisticated questions than me. FWIW, I've been building a little toy lisp-scheme like scripting language as a learning tool: https://github.com/zeraholladay/lispm/tree/main. It's been a lot of fun and I've really enjoyed exploring parts of CS that my career hasn't really taken me to.

cryptonector 21 hours ago

> While I'm here I'll ask: does anybody have a recommendation for resources on learning lisp that focus on how the interpreter/compiler works?

Yes: Lisp in Small Pieces, by Christian Queinnec.

I've read a fair number of books on Lisp, including Paul Graham's On Lisp and ANSI Common Lisp, and IMO Lisp in Small Pieces is the best for understanding how to implement a Lisp. LiSP covers interpreters and compilers, and it covers how to implement call/cc. Like Paul Graham's books on Lisp, LiSP is highly enjoyable -- these are books I read for fun.

[removed] a day ago
[deleted]
monkeyelite 16 hours ago

> focus on how the interpreter/compiler works

This is the subject of the classic book "Structure and interpretation of computer programs"

robert-brown a day ago

Folks on the #clschool, #commonlisp, and #sbcl IRC channels can answer your questions.

anthk a day ago

ECL will run everything SBCL can, but far more slowly, albeit the gap it's closing.

On QuickLisp, use Ultralisp, as an updated mcclim with an up to date clx it's like night and day on speed.

If you use Portacle, just run:

(ql-dist:install-dist "http://dist.ultralisp.org/" :prompt nil)

Otherwise, install Ultralisp and then eval that function to get an updated QL base).

Recommended book: https://www.cs.cmu.edu/~dst/LispBook/

Portacle which bundles Emacs + SBCL: https://portacle.github.io/