Comment by newzino

Comment by newzino 2 days ago

5 replies

The "aliases not pointers" approach for memory safety is interesting. Curious how you handle the performance implications - traditional aliasing analysis in compilers is expensive because determining whether two aliases point to the same memory is hard.

Are you doing this at runtime (reference counting or similar), or have you found a way to make the static analysis tractable by restricting what aliasing patterns are allowed?

The 250kB size is impressive for a language with inheritance and N-dimensional arrays. For comparison, Lua's VM is around 200-300kB and doesn't include some of those features. What did you have to leave out to hit that size? I assume no JIT, but what about things like regex, IO libraries, etc?

Also - calling back into C functions from the script is a key feature for embeddability. How do you handle type marshalling between the script's type system and C's? Do you expose a C API where I register callbacks with type signatures, or is there reflection/dynamic typing on the boundary?

briancr 2 days ago

Good questions! The short answer to the first is that the language is interpreted, not compiled, so optimizations are moot.

Aliases are strongly-typed which helps avoid some issues. Memory mods come with the territory —- if ‘a’ and ‘b’ point to the same array and ‘a’ resizes that array, then the array behind ‘b’ gets resized too. The one tricky situation is when ‘a’ and ‘b’ each reference range of elements, not the whole array, because a resize of ‘a’ would force a resize of the width of ‘b’. Resizing in this case is usually not allowed.

Garbage collection is indeed done (poorly) by reference counting, and also (very well) by a tracing function that Cicada’s command line script runs after every command.

You’re exactly right, the library is lean because I figure it’s easy to add a C function interface for any capability you want. There’s a bit of personal bias as to what I did include - for example all the basic calculator functions are in, right down to atan(), but no regex. Basic IO (save, load, input, print) is included.

Type marshaling — the Cicada int/float types are defined by cicada.h and can be changed! You just have to use the same types in your C code.

When you run Cicada you pass a list of C functions paired with their Cicada names: { “myCfunction”, &myCfunction }. Then, in Cicada, $myCfunction() runs the callback.

Thanks for the questions! This is exactly the sort of feedback that helps me learn more about the landscape..

  • newzino 2 days ago

    Thanks for the detailed response. The interpreted approach makes sense for the use case - when you're embedding a scripting layer, you usually want simplicity and portability over raw speed anyway.

    The aliasing semantics you describe (resizes propagating through aliases) is an interesting choice. It's closer to how references work in languages like Python than to the "borrow checker" approach Rust takes. Probably more intuitive for users coming from dynamic languages, even if it means some operations need runtime checks.

    The hybrid GC approach (reference counting + periodic tracing) is pragmatic. Reference counting handles the common case cheaply, and the tracing pass catches cycles. That's similar to how CPython handles it.

    The C registration API sounds clean - explicit pairing of names to function pointers is about as simple as it gets. Do you handle varargs on the Cicada side, or does each registered function have a fixed arity that the interpreter enforces?

    • briancr 2 days ago

      Yes there are lots of runtime checks.. unfortunately, but I always fork the time-consuming calculations into C anyway so those checks don’t really affect overall performance much.

      Scripted functions have no set arity, and the same applies to callback C functions. Scripted functions collect their arguments inside an ‘args’ variable. Likewise, each C function has a single ‘argsType’ argument which collects the argument pointers & type info, and there are macros to help unpack them but if you want to do the unpacking manually then the function can be called variadically:

      ccInt myCfunction(argsType args)

      { for (int a = 0; a < args.num; a++) printf(“%p\n”, args.p[a]); return 0; }

      So all functions are automatically variadic.

      It’s good to know that these GC/etc. solutions are even used by the big languages..

      • newzino a day ago

        The "all functions are automatically variadic" design is a nice simplicity win. No overloading, no arity mismatches at call sites - just a uniform calling convention.

        The argsType struct with pointer array and count is essentially how varargs works at the ABI level in C anyway, you've just made it explicit. And having the type info alongside the pointers means you get runtime type checking without the caller needing to pass format strings or sentinel values like traditional C varargs.

        The tradeoff is you lose static arity checking at parse time, but for an embedded scripting use case that's probably fine - you're validating at runtime anyway and the error messages can be more helpful than "wrong number of arguments."

        Do you have plans for optional/default arguments, or is that outside the scope? With variadic-by-default it'd be natural to just check args.num and use defaults for missing ones.

        • briancr a day ago

          Yes and the simplicity extends to function definitions too, since you don’t have to specify any type info. E.g.

          f :: { ; print(args) }

          Brevity is especially nice for inline/anonymous functions.

          You can definitely use args.num, args.type[], and args.indices[] to figure out which optional parameters were passed, but I’ve decided that it’s usually easier to pass a full set of parameters into C and have the scripted wrapper handle the optional params. This is easy in Cicada because of ‘code substitution’ (one of the innovations I’m proudest of and if you’ve seen this elsewhere please let me know!). Example:

          callC :: {

              mandatoryArgs :: { int, int }
          
              optionalArgs :: { str :: string; str = “default” }
          
              code
          
              mandatoryArgs = args
          
              optionalArgs(), (optionalArgs<<args)()    | set default, then allow user to change it
          
              $Cfunction(mandatoryArgs, optionalArgs)
          
          }

          Then you can call it with or without modifying the optional parameters from their default values.

          callC(2, 3) | uses the default string

          callC(2, 3; str = “modified param”)

          callC() runs its arguments as a function, substituted into the params variable, allowing the arguments to modify params. This is weird and I haven’t seen it elsewhere, but it’s very useful.