Comment by EMM_386

Comment by EMM_386 2 days ago

Isn't this due to the 100M+ line C++ multi-threaded dependency being a potential nightmare when you are dealing with images in browsers/emails/etc. as an attack surface?

I think both Mozilla and Google are OK with this - if it is written in Rust in order to avoid that situation.

I know the linked post mentions this but isn't that the crux of the whole thing? The standard itself is clearly an improvement over what we've had since forever.

tensegrist 2 days ago

100M+ is a bit more than i would expect for an image format. have i not been paying attention

Reply View 21 replies

aw1621107 2 days ago

According to tokei, the lib/ directory from the reference implementation [0] has 93821 lines of C++ code and 22164 lines of "C Header" (which seems to be a mix of C++ headers, C headers, and headers that are compatible with both C and C++). The tools/ directory adds 16314 lines of C++ code and 1952 lines of "C Header".
So at least if GP was talking about libjxl "100K+" would be more accurate.
[0]: https://github.com/libjxl/libjxl

Reply View | 13 replies
- jiggawatts 2 days ago
  
  One of the best ways to measure code complexity is to zip up the source code. This eliminates a lot of the redundancies and is a more direct measure of entropy/complexity than almost anything else.
  By that metric, jpeg-xl is about 4x the size of the jpeg or png codebase.
  
  Reply View | 3 replies
  
  account42 a day ago
  
  Your method would still judge well-documented code with lots of intermediate variables as more complex than undocumented code golf soup.
  
  Reply View | 0 replies
  
  tkfoss 2 days ago
  
  Interesting approach
  
  Reply View | 1 reply
  
  jiggawatts 2 days ago
  
  It comes from the "intelligence is a form of compression" hypothesis that has been floating around in the ML space. Also, with a good compression algorithm it is a fairly direct measure of entropy, which is quite well correlated with what a developer might consider code size and/or complexity.
  
  Reply View | 0 replies
- palmotea 2 days ago
  
  >> 100M+ is a bit more than i would expect for an image format. have i not been paying attention
  > So at least if GP was talking about libjxl "100K+" would be more accurate.
  M can mean thousands and I think it's common to use it used that way in finance and finance-adjacent areas: https://www.chicagomanualofstyle.org/qanda/data/faq/topics/A...:
  > A. You’ve identified two commonly used conventions in finance, one derived from Greek and the other from Latin, but neither one is standard.
  Starting with the second convention, M is used for amounts in the thousands and MM for amounts in the millions (usually without a space between the number and the abbreviation—e.g., $150M for $150,000 and $150MM for $150 million). This convention overlaps with the conventions for writing roman numerals, according to which a thousand is represented by M (from mille, the Latin word for “thousand”). Any similarity with roman numerals ends there, however, because MM in roman numerals means two thousand, not a thousand thousands, or one million, as in financial contexts...
  https://www.accountingcoach.com/blog/what-does-m-and-mm-stan...:
  > An expense of $60,000 could be written as $60M. Internet advertisers are familiar with CPM which is the cost per thousand impressions.
  > The letter k is also used represent one thousand. For example, an annual salary of $60,000 might appear as $60k instead of $60M.
  
  Reply View | 8 replies
  
  WheatMillington 2 days ago
  
  I assume this is regional... I work in accounting and finance in New Zealand (generally following ordinary Western/Commonwealth standards) and I've never heard of using M for thousands. If I used that I would confuse the hell out of everyone around me.
  
  Reply View | 3 replies
  
  dataflow 2 days ago
  
  Okay, but this is... not finance? And the article itself wrote 100K. Rewriting that as 100M does nobody a favor.
  
  Reply View | 0 replies
  
  sealeck 2 days ago
  
  I don't think many (if any) programmers would imagine 100M lines of code to mean 100,000 lines of code and not 1,000,000...
  
  Reply View | 0 replies
  
  uselesswords 2 days ago
  
  Technically right is the worst kind of right
  
  Reply View | 1 reply
  
  palmotea a day ago
  
  I'm surprised at the negative reaction to having it pointed out that the OP may not be wrong, just using a dialect.
  
  Reply View | 0 replies
munificent 2 days ago

The article says 100K, not 100M. I'm guessing that's what the parent comment meant.
100MLOC for an image format would be bananas. You could fit the entire codebases of a couple of modern operating systems, a handful of AAA videogames, and still have room for several web apps and command line utilities in 100MLOC.

Reply View | 1 reply
- JyrkiAlakuijala 2 days ago
  
  the article includes test code and encoder code, that is not the way how we compute the decoder size
  the decoder is something around 30 kloc
  
  Reply View | 0 replies
crooked-v 2 days ago

It's a container format that does about a bajillion things - lossy, lossless, multiple modes optimized for different image types (photography vs digital design), modern encode/decode algorithms, perceptual color space, adaptive quantization, efficient ultra-high-resolution decoding and display, partial and complete animation, tile handling, everything JPEG does, and a bunch more.

Reply View | 1 reply
- furyofantares 2 days ago
  
  The Linux kernel is 40M lines of code after 34 years of development.
  OP might have well have said "infinite lines of code" for JPEGXL and wouldn't have been much less accurate. Although I'm guessing they meant 100k.
  
  Reply View | 0 replies
EMM_386 19 hours ago

You are correct, "K" not "M" in my typo.

Reply View | 0 replies
GaggiX 2 days ago

They wanted to say 100K instead of 100M

Reply View | 1 reply
- EMM_386 19 hours ago
  
  They did indeed.
  
  Reply View | 0 replies

JyrkiAlakuijala 2 days ago

This is some strange misinformation.

The C++ JPEG XL decoder is ~30'000 lines, i.e., 3000x smaller than you claim. A non-multithreaded, non-simdified code would be much simpler, around 8000 to 10000 lines of code.

It is not difficult to measure from the repository. The compiled compressed binary for an APK is 5x smaller than that of full AVIF. The complete specification at under 100 pages is ~13x more compact than that of full AVIF.

Reply View 1 reply

charleslmunger a day ago

>The compiled compressed binary for an APK
This doesn't undermine your argument at all, but we should not be compressing native libs in APKs.
https://developer.android.com/guide/topics/manifest/applicat...

Reply View | 0 replies

dataflow 2 days ago

You mean 100K+? A large chunk of which they say is testing code?

Reply View 0 replies

bmicraft 2 days ago

Google is one of the parties involved in the creating of jxl. If it's their own fault they didn't write a decoder in a memory safe language sooner.

Reply View 0 replies

cornstalks 2 days ago

libjxl is is <112,888 lines of code, about 3 orders of magnitude less than you're 100M+ claim.

Reply View 2 replies

sunaookami 2 days ago

Do people really not know what a hyperbole is?

Reply View | 1 reply
- cornstalks 2 days ago
  
  100M+ lines of code isn't a hyperbole for some codebases, though. google3 is estimated at about 2 billion lines of code, for example.
  Maybe it was hyperbole. But if it was it wasn't obvious to me, unfortunately.
  
  Reply View | 0 replies

ajcp 2 days ago

-> They were concerned about the increased attack surface resulting from including the current 100K+ lines C++ libjxl reference decoder, even though most of those lines are testing code.

Seems like Google has created a memory-safe decoder for it in Rust or something.

Reply View 0 replies

theoldgreybeard 2 days ago

because memory safety is the only attack vector, as we all know

Reply View 3 replies

UltraSane 2 days ago

It is a very big one and eliminating it is a huge improvement in security. You can then spend more time fixing all the other sources of security problems.

Reply View | 2 replies
- LtWorf a day ago
  
  https://lwn.net/Articles/1048446/
  
  Reply View | 1 reply
  
  UltraSane a day ago
  
  [dead]
  
  Reply View | 0 replies

MaxBarraclough 2 days ago

> I think both Mozilla and Google are OK with this - if it is written in Rust in order to avoid that situation.

It would need to be written in the Safe Rust subset to give safety assurances. It's an important distinction.

Reply View 2 replies

dgacmu 2 days ago

99% safe with 1% unsafe mixed in is far, far better than 100k loc of c++ -- look at Google's experience with rust in Android. It's not perfect and they had one "almost vulnerability" but the rate of vulnerabilities is much, much lower even with a bit of unsafe mixed in.

Reply View | 1 reply
- MaxBarraclough 2 days ago
  
  Agreed, and Google developers can probably be trusted to 'act responsibly', but too often people forget the distinction. Some Rust codebases are wildly unsafe, and too often people see written in Rust and falsely conclude it's a memory-safe codebase.
  
  Reply View | 0 replies

otabdeveloper4 2 days ago

> ...but now in le Rust!!1

I look forward to the next generation of rubes rewriting this all in some newer ""safe"" language in three decades.

Reply View 3 replies

UltraSane 2 days ago

Because a language happily letting you try to access an array index far past its end isn't stupid at all.

Reply View | 2 replies
- otabdeveloper4 a day ago
  
  If this was a real problem then you could have just `s/[]/at()/g` across your codebase and called it a day.
  But you all don't even bother to do that, so I guess it's not actually a problem in practice.
  
  Reply View | 1 reply
  
  UltraSane a day ago
  
  C doesn't have any protection for accessing out of bounds. It does zero bounds checking behind the scenes. Which is actually really, REALLY stupid. And when all computers are connected to the internet this is disastrous.
  
  Reply View | 0 replies