A high-performance, zero-overhead, extensible Python compiler using LLVM

241 points by wspeirs 10 months ago

haberman 10 months ago

> Non-goals: Drop-in replacement for CPython: Codon is not a drop-in replacement for CPython. There are some aspects of Python that are not suitable for static compilation — we don't support these in Codon.

This is targeting a Python subset, not Python itself.

For example, something as simple as this will not compile, because lists cannot mix types in Codon (https://docs.exaloop.io/codon/language/collections#strong-ty...):

    l = [1, 's']

It's confusing to call this a "Python compiler" when the constraints it imposes pretty fundamentally change the nature of the language.

Reply View 57 replies

quotemstr 10 months ago

It's not even a subset. They break foundational contracts of the Python language without technical necessity. For example,
> Dictionaries: Codon's dictionary type does not preserve insertion order, unlike Python's as of 3.6.
That's a gratuitous break. Nothing about preserving insertion order interferes with compilation, AOT or otherwise. The authors of Codon broke dict ordering because they felt like it, not because they had to.
At least Mojo merely claims to be Python-like. Unlike Codon, it doesn't claim to be Python then note in the fine print that it doesn't uphold Python contractual language semantics.

Reply View | 14 replies
- orf 10 months ago
  
  Try not to throw around statements like “they broke duct ordering because they felt like it”.
  Obviously they didn’t do that. There are trade-offs when preserving dictionary ordering.
  
  Reply View | 7 replies
  
  baq 10 months ago
  
  dicts ordering keys in insertion order isn't an implementation detail anymore and hasn't been for years.
  
  Reply View | 2 replies
  
  dathinab 10 months ago
  
  if you claim
  > high-performance Python implementation
  then no this aren't trade-offs but breaking the standard without it truly being necessary
  most important this will break code in a subtle and potentially very surprising way
  they could just claim they are python like and then no one would hold them for not keeping to the standard
  but if you are misleading about your product people will find offense even if it isn't intentionally
  
  Reply View | 0 replies
  
  actionfromafar 10 months ago
  
  The trade-off is a bit of speed.
  
  Reply View | 2 replies
- adammarples 10 months ago
  
  Well would you claim that Python 3.5 isn't python?
  
  Reply View | 5 replies
  
  stoperaticless 10 months ago
  
  All versions of python are python.
  If lang is not compatible with any of python versions, then the lang isn’t python.
  False advertising is not nice. (even if the fineprint clarifies)
  
  Reply View | 4 replies
wpietri 10 months ago

Yeah, this right here would kill it for me:
> Strings: Codon currently uses ASCII strings unlike Python's unicode strings.
That rules out almost anything web-ish for me.
The use case I could imagine is places where you have a bunch of python programmers who don't really want to learn another language but you have modest amounts of very speed-sensitive work.
E.g., you're a financial trading company who has hired a lot of PhDs with data science experience. In that context, I could imagine saying, "Ok, quants, all of your production code has to work in Codon". It's not like they're programming masters anyhow, and having it be pretty Python-ish will be good enough for them.

Reply View | 2 replies
- Retr0id 10 months ago
  
  >> Strings: Codon currently uses ASCII strings unlike Python's unicode strings.
  Yikes. These days I wouldn't even call those strings, just bytes. I can live with static/strong typing (I prefer it, even), but not having support for actual strings is a huge blow.
  
  Reply View | 0 replies
- wpietri 10 months ago
  
  Ah, looking further, I find this about the company: "Their focus lies in bridging the gap between these two aspects across various domains, with a particular emphasis on life sciences and bioinformatics."
  That makes sense as a sales pitch. "Hey, company with a lot of money! Want your nerds to go faster and need less expensive hardware? Pay us for magic speed-ups!" So it's less a product for programmers than it is for executives.
  
  Reply View | 0 replies
bpshaver 10 months ago

Who is out here mixing types in a list anyway?

Reply View | 31 replies
- dathinab 10 months ago
  
  parsing json is roughly of the type:
  type Json = None | bool | float | str | dict[str, Json] | list[Json]
  you might have similar situations for configs e.g. float | str for time in seconds or a human readable time string like "30s" etc.
  given how fundamental such things are I'm not sure if there will be any larger projects (especially wrt. web servers and similar) which are compatible with this
  also many commonly used features for libraries/classes etc. are not very likely to work (but idk. for sure, they just are very dynamic in nature)
  so IMHO this seems to be more like a python-like language you can use for idk. some since computations and similar then a general purpose faster python
  
  Reply View | 3 replies
  
  bpshaver 10 months ago
  
  Agreed, I was just joking. I understand heterogenous lists are possible with Python, but with the use of static type checking I feel like its pretty rare for me to have heterogenous lists unless its duck typing.
  
  Reply View | 2 replies
- orf 10 months ago
  
  It’s common to have a list of objects with different types, but which implement the same interface. Duck typing of this kind is core to Python.
  
  Reply View | 1 reply
  
  bpshaver 10 months ago
  
  Good point.
  
  Reply View | 0 replies
- CaptainNegative 10 months ago
  
  I often find myself mixing Nones into lists containing built-in types when the former would indicate some kind of error. I could wrap them all into a nullable-style type, but why shouldn't the interpreter implicitly handle that for me?
  
  Reply View | 1 reply
  
  bpshaver 10 months ago
  
  Yeah, that seems fair.
  
  Reply View | 0 replies
- itishappy 10 months ago
  
  The json module returns heterogenous dicts.
  https://docs.python.org/3/library/json.html
  
  Reply View | 6 replies
  
  bpshaver 10 months ago
  
  Yeah, just because it can do that doesn't mean that it is good design.
  
  Reply View | 5 replies
- dekhn 10 months ago
  
  I've been mixing types in Python lists for several decades now. Why wouldn't you? it's a list of PyObjects.
  
  Reply View | 0 replies
- gwking 10 months ago
  
  An example related to JSON content is HTML content. I have a Python library that represents all of the standard HTML tags as a family of classes. It is like a lightweight DOM on the server side, and has resulted in a web server that does not use string based templating at all. It lets me construct trees of HTML completely in Python and then render them out with everything correctly escaped. I can also parse HTML into trees and manipulate them as I please (for e.g. scraping tasks and document transforms). It is all strongly typed using mypy and I adhere to the strictest generic typing I can manage.
  Each node has a list of children, and the element type is `str|HtmlNode`. I find this vastly easier to use than the LXML ETree api, where nodes have `text` and `tail` attributes to represent interleaved text.
  Interestingly, the LXML docs promote their design as follows: > he two properties .text and .tail are enough to represent any text content in an XML document. This way, the ElementTree API does not require any special text nodes in addition to the Element class, that tend to get in the way fairly often (as you might know from classic DOM APIs). https://lxml.de/tutorial.html#elements-contain-text
  It could be a simple matter of taste! But I suspect that the difference between what they are describing as "classic DOM" vs what I am doing is that they are referring to experience with C/C++/Java libraries circa 2009 that had much less convenient dynamic type introspection. The "get in the way fairly often" reminds me of how verbose it is to deal with heterogenous data in C/C++/ObjC. In ObjC for example, you could have an array mixing NSString with other NSObject subclasses, but you had to do work to type it correctly. If you wanted numbers in there you had to use NSNumber which is an annoying box type that you never otherwise use. And ObjC was considered very dynamic in its day!
  I have long felt that the root of much evil was the overbearing distinction between primitive and object types in C++/Java/Objective-C.
  All of this is a long way of saying, I think "how to deal with heterogenous lists of stuff" is a huge question in language design, library design, and the daily work of programming. Modern languages have by no means converged on a single way to represent varying types of elements. If you want to create trees of stuff, at some level that is "mixing types in a list" no matter how you might try to encode it. Just food for thought!
  
  Reply View | 0 replies
- nicce 10 months ago
  
  Everyone who chooses the Python in the first hand.
  
  Reply View | 1 reply
  
  bpshaver 10 months ago
  
  Well, I'm one of those people, and I feel that I rarely do this. Except if I have a list of different objects that implement the same interface, as another commenter mentioned.
  
  Reply View | 0 replies
- RogerL 10 months ago
  
  return [key, value]
  
  Reply View | 10 replies
  
  Myrmornis 10 months ago
  
  You should use a tuple there: it's a collection of fixed size where each slot has an identity. (There's a common confusion in Python circles that the main point of tuples is immutability; that's not so).
  
  Reply View | 0 replies
  
  ghxst 10 months ago
  
  Why would you do this over `return key, value` which produces a tuple? Just curious.
  
  Reply View | 8 replies
- __mharrison__ 10 months ago
  
  Someone who is using Python the wrong way.
  
  Reply View | 0 replies
BiteCode_dev 10 months ago

For a real compiler try nuitka.

Reply View | 0 replies
odo1242 10 months ago

Yeah, it feels closer to something like Cython without the python part.

Reply View | 0 replies
jjk7 10 months ago

The differences seem relatively minor. Your specific example can be worked around by using a tuple; which in most cases does what you want.

Reply View | 4 replies
- itishappy 10 months ago
  
  Altering python's core datatypes is not what I'd call minor.
  They don't even mention the changes to `list`.
  > Integers: Codon's int is a 64-bit signed integer, whereas Python's (after version 3) can be arbitrarily large. However Codon does support larger integers via Int[N] where N is the bit width.
  > Strings: Codon currently uses ASCII strings unlike Python's unicode strings.
  > Dictionaries: Codon's dictionary type does not preserve insertion order, unlike Python's as of 3.6.
  > Tuples: Since tuples compile down to structs, tuple lengths must be known at compile time, meaning you can't convert an arbitrarily-sized list to a tuple, for instance.
  https://docs.exaloop.io/codon/general/differences
  Pretty sure this means the following doesn't work either:
  config = { "name": "John Doe", "age": 32 }
  Note: It looks like you can get around this via Python interop, but that further supports the point that this isn't really Python.
  
  Reply View | 3 replies
  
  dathinab 10 months ago
  
  > Strings: Codon currently uses ASCII strings unlike Python's unicode strings.
  wtf this is a supper big issue making this basically unusable for anything handling text (and potentially even just fixed indents, if you aren't limited to EU+US having non us-ascii idents in code or text is common, i.e. while EU companies most times code in english this is much less likely in Asia, especially China and Japan.
  it isn't even really a performance benefit compared to utf-8 as utf-8 only using us-ascii letters _is_ us-ascii and you don't have to use unicode aware string operations
  
  Reply View | 2 replies

Lucasoato 10 months ago

> Is Codon free? Codon is and always will be free for non-production use. That means you can use Codon freely for personal, academic, or other non-commercial applications.

I hope it is released under a truly open-source license in the future; this seems like a promising technology. I'm also wondering how it would match C++ performance if it is still garbage collected.

Reply View 4 replies

troymc 10 months ago

The license is the "Business Source License 1.1" [1].
The Business Source License (BSL) 1.1 is a software license created by MariaDB Corporation. It's designed as a middle ground between fully open-source licenses and traditional proprietary software licenses. It's kind of neat because it's a parameteric license, in that you can change some parameters while leaving the text of the license unchanged.
For codon, the "Change Date" is 2028-03-01 and the "Change License" is "Apache License, Version 2.0", meaning that the license will change to Apache2 in March of 2028. Until then, I guess you need to make a deal with Exaloop to use codon in production.
[1] https://github.com/exaloop/codon?tab=License-1-ov-file#readm...

Reply View | 3 replies
- axit 10 months ago
  
  From what I've seen is the "Change Date" is usually updated so you always have a few years older software as Apache License and the latest software as BSL
  
  Reply View | 2 replies
  
  actionfromafar 10 months ago
  
  Just to make it clear - the cutoff date on previously released software remains the same. So if you download it now and wait a few years, your software will have matured into its final form, the Apache 2 license.
  
  Reply View | 0 replies
  
  troymc 10 months ago
  
  That make sense. Thanks for clarifying.
  
  Reply View | 0 replies

actionfromafar 10 months ago

I immediately wonder how it compares to Shedskin¹

I can say one thing - Shedskin compiles to C++, which was very compelling to me for integrating into existing C++ products. Actually another thing too, Shedskin is Open Source under GPLv3. (Like GCC.)

1: https://github.com/shedskin/shedskin/

Reply View 2 replies

crorella 10 months ago

I looks like codon has less restrictions when compared to shed skin.

Reply View | 1 reply
- actionfromafar 10 months ago
  
  I suppose that's right, I don't think shedskin can call numpy yet, for instance. On the other hand it seems easier to put shedskin on an embedded device, for instance.
  
  Reply View | 0 replies

amelius 10 months ago

The challenge is not just to make Python faster, it's to make Python faster __and__ port the ecosystem of Python modules to your new environment.

Reply View 2 replies

eigenspace 10 months ago

It’s also just simply not python. It’s a separate language with a confusingly close syntax to python, but quite different semantics.

Reply View | 0 replies
Myrmornis 10 months ago

This should be top comment. If I don't get the ecosystem then I'd just use Rust.

Reply View | 0 replies

veber-alex 10 months ago

What's up with their benchmarks[1], it just shows benchmark names and I don't see any numbers or graphs. Tried Safari and Chrome.

[1]: https://exaloop.io/benchmarks/

Reply View 2 replies

sdmike1 10 months ago

The benchmark page looks to be broken, the JS console is showing some 404'd JS libs and a bad function call.

Reply View | 0 replies
pizlonator 10 months ago

Also those are some bullshit benchmarks.
It’s not surprising that you can make a static compiler that makes tiny little programs written in a dynamic language into fast executables.
The hard part is making that scale to >=10,000 LoC programs. I dunno which static reasoning approaches codon uses, but all the ones I’m familiar with fall apart when you try to scale to large code.
That’s why JS benchmarking focused on larger and larger programs over time. Even the small programs that JS JIT writers use tend to have a lot of subtle idioms that break static reasoning, to model what happens in larger programs.
If you want to get in the business of making dynamic languages fast then the best advice I can give you is don’t use any of the benchmarks that these folks cite for your perf tuning. If you really do have to start with small programs then something like Richards or deltablue are ok, but you’ll want to diversify to larger programs if you really want to keep it real.
(Source: I was a combatant in the JS perf wars for a decade as a webkitten.)

Reply View | 0 replies

w10-1 10 months ago

Unclear if this has been in the works longer as the graalvm LLVM build of python discussed yesterday[1]. The first HN discussion is from 2022 [3].

Any relation? Any comparisons?

Funny I can't find the license for graalvm python in their docs [2]. That could be a differentiator.

- [1] GraalVM Python on HN https://news.ycombinator.com/item?id=41570708

- [2] GraalVM Python site https://www.graalvm.org/python/

- [3] HN Dec 2022 https://news.ycombinator.com/item?id=33908576

Reply View 2 replies

vamega 10 months ago

GraalPy license on GitHub - https://github.com/oracle/graalpython/blob/master/LICENSE.tx...

Reply View | 0 replies
mech422 10 months ago

Might want to look at PyPy too: https://pypy.org/features.html

Reply View | 0 replies

codethief 10 months ago

Reminds me of these two projects which were presented at EuroPython 2024 this summer:

https://ep2024.europython.eu/session/spy-static-python-lang-...

https://ep2024.europython.eu/session/how-to-build-a-python-t...

(The talks were fantastic but they have yet to upload the recordings to YouTube.)

Reply View 0 replies

timwaagh 10 months ago

It's a really expensive piece of software. They do not publish their prices because of it. I don't think it's reasonable to market such products onto your average dev because of it. Anyhow Cython and a bunch of others provide a free and open source alternative.

Reply View 0 replies

albertzeyer 10 months ago

There is also RPython (used by PyPy) (https://rpython.readthedocs.io/), which is a strict subset of Python, allowing for static analysis, specifically for the translation logic needed by PyPy. Thus, I was told that RPython is not really intended as a general purpose language/compiler but only really specifically to implement sth like PyPy.

But it's anyway maybe an interesting comparison to Codon.

Reply View 0 replies

jay-barronville 10 months ago

Instead of building their GPU support atop CUDA/NVIDIA [0], I’m wondering why they didn’t instead go with WebGPU [1] via something like wgpu [2]. Using wgpu, they could offer cross-platform compatibility across several graphics API’s, covering a wide range of hardware including NVIDIA GeForce and Quadro, AMD Radeon, Intel Iris and Arc, ARM Mali, and Apple’s integrated GPU’s.

They note the following [0]:

> The GPU module is under active development. APIs and semantics might change between Codon releases.

The thing is, based on the current syntax and semantics I see, it’ll almost certainly need to change to support non-NVIDIA devices, so I think it might be a better idea to just go with WebGPU compute pipelines sooner rather than later.

Just my two pennies…

[0]: https://docs.exaloop.io/codon/advanced/gpu

[1]: https://www.w3.org/TR/webgpu

[2]: https://wgpu.rs

Reply View 2 replies

MadnessASAP 10 months ago

Well for better or worse CUDA is the GPU programming API. If you're doing high performance GPU workloads you're almost certainly doing it in CUDA.
WebGPU while stating compute is within their design I would imagine is focused on presentation/rendering and probably not on large demanding workloads.

Reply View | 0 replies
pjmlp 10 months ago

Because WebGPU is a API designed for browsers, targeting hardware designs from 2016.

Reply View | 0 replies

GTP 10 months ago

People that landed here may be interested in Mojo [0] as well.

[0] https://www.modular.com/mojo

Reply View 0 replies

ipsum2 10 months ago

Previous discussion (2022): https://news.ycombinator.com/item?id=33908576

Reply View 0 replies

big-chungus4 10 months ago

so, assuming I don't get integers bigger than int64, and don't use the order of build in dicts, can I just use arbitrary python code and use it with codon? Can I use external libraries? Numpy, PyTorch? Also noticed that it isn't supported on windows

Reply View 0 replies

shikon7 10 months ago

From the documentation of the differences with Python:

> Strings: Codon currently uses ASCII strings unlike Python's unicode strings.

That seems really odd to me. Who would use a framework nowadays that doesn't support unicode?

Reply View 0 replies

Sparkenstein 10 months ago

Biggest problem at the moment is async support, I guess

https://github.com/exaloop/codon/issues/71

Reply View 0 replies

zamazan4ik 10 months ago

I hope one day the compiler itself will be optimized even more: https://github.com/exaloop/codon/issues/137

Reply View 0 replies

tony-allan 10 months ago

I would love to see LLVM/WebAssembly as a supported and documented backend!

Reply View 0 replies

xiaodai 10 months ago

Please stop trying to make python fast. Move over to Julia already.

Reply View 0 replies

jitl 10 months ago

What’s the difference between this and Cython? I think another comment already asks about shedskin.

Reply View 3 replies

rich_sasha 10 months ago

Cython relies heavily on the Python runtime. You cannot, for example, make a standalone binary with it. A lot of unoptimized Cython binary is just Python wrapped in C.
From a quick glance this seems to genuinely translate into native execution.

Reply View | 1 reply
- edscho 10 months ago
  
  You absolutely can create a standalone binary with Cython: see the `--embed` option [1].
  [1] https://cython.readthedocs.io/en/stable/src/tutorial/embeddi...
  
  Reply View | 0 replies
[removed] 10 months ago

[deleted]

Reply View | 0 replies

mgaunard 10 months ago

aren't there like a dozen of those already?

numba, cython, pypy...

Reply View 0 replies