Comment by carbocation

Comment by carbocation 19 hours ago

19 replies

View on Hacker News

That repo is throwing up a 404 for me.

Question - did you consider tradeoffs between duckdb (or other columnar stores) and SQLite?

keepamovin 19 hours ago

No, I just went straight to sqlite. What is duckdb?

Reply View 6 replies

simonw 18 hours ago

One interesting feature of DuckDB is that it can run queries against HTTP ranges of a static file hosted via HTTPS, and there's an official WebAssembly build of it that can do that same trick.
So you can dump e.g. all of Hacker News in a single multi-GB Parquet file somewhere and build a client-side JavaScript application that can run queries against that without having to fetch the whole thing.
You can run searches on https://lil.law.harvard.edu/data-gov-archive/ and watch the network panel to see DuckDB in action.

Reply View | 1 reply
- keepamovin 9 hours ago
  
  In that case, then using duckdb might be even more performant than using what we’re doing here.
  It would be an interesting experiment to add the duckdb hackend
  
  Reply View | 0 replies
fsiefken 19 hours ago

DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It's designed to provide high performance on complex queries against large databases in embedded configuration.
It has transparent compression built-in and has support for natural language queries. https://buckenhofer.com/2025/11/agentic-ai-with-duckdb-and-s...
"DICT FSST (Dictionary FSST) represents a hybrid compression technique that combines the benefits of Dictionary Encoding with the string-level compression capabilities of FSST. This approach was implemented and integrated into DuckDB as part of ongoing efforts to optimize string storage and processing performance." https://homepages.cwi.nl/~boncz/msc/2025-YanLannaAlexandre.p...

Reply View | 0 replies
cess11 19 hours ago

It is very similar to SQLite in that it can run in-process and store its data as a file.
It's different in that it is tailored to analytics, among other things storage is columnar, and it can run off some common data analytics file formats.

Reply View | 0 replies
1vuio0pswjnm7 15 hours ago

"What is duckdb?"
duckdb is a 45M dynamically-linked binary (amd64)
sqlite3 1.7M static binary (amd64)
DuckDB is a 6yr-old project
SQLite is a 25yr-old project

Reply View | 1 reply
- 1vuio0pswjnm7 12 hours ago
  
  I like SQLite
  
  Reply View | 0 replies

jacquesm 15 hours ago

Maybe it got nuked by MS? The rest of their repo's are up.

Reply View 3 replies

keepamovin 7 hours ago
Hey jacquesm! No, I just forgot to make it public.
BUT I did try to push the entire 10GB of shards to GitHub (no LFS, no thanks, money), and after the 20 minutes compressing objects etc, "remote hang up unexpectedly"
To be expected I guess. I did not think GH Pages would be able to do this. So have been repeating:
wrangler pages deploy docs --project-name static-news --commit-dirty=true
on changes and first time CF Pages user here, much impressed!
Reply View | 2 replies
- jacquesm 5 hours ago
  
  Pretty neat project. I never thought you could do this in the first place, very much inspiring. I've made a little project that stores all of its data locally but still runs in the browser to protect against take downs and because I don't think you should store your precious data online more than you have to, eventually it all rots away. Your project takes this to the next level.
  
  Reply View | 1 reply
  
  keepamovin 4 hours ago
  
  Thanks, bud, that means a lot! Would like to see your versions of the data stored offline idea, it's very cool.
  
  Reply View | 0 replies

keepamovin 9 hours ago

i forgot to set repo to public. Fixed now

Reply View 0 replies

3eb7988a1663 19 hours ago

While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.

Reply View 1 reply

peheje 5 hours ago

the data is dominated by big unique TEXT columns, unsure how that can much compress better when grouped - but would be interesting to know

Reply View | 0 replies

linhns 19 hours ago

Not the author here. I’m not sure about DuckDB, but SQLite allows you to simply use a file as a database and for archiving, it’s really helpful. One file, that’s it.

Reply View 4 replies

cobolcomesback 19 hours ago

DuckDB does as well. A super simplified explanation of duckdb is that it’s sqlite but columnar, and so is better for analytics of large datasets.

Reply View | 3 replies
- formerly_proven 19 hours ago
  
  The schema is this: items(id INTEGER PRIMARY KEY, type TEXT, time INTEGER, by TEXT, title TEXT, text TEXT, url TEXT
  Doesn't scream columnar database to me.
  
  Reply View | 2 replies
  
  embedding-shape 19 hours ago
  
  At a glance, that is missing (at least) a `parent` or `parent_id` attribute which items in HN can have (and you kind of need if you want to render comments), see http://hn.algolia.com/api/v1/items/46436741
  
  Reply View | 1 reply
  
  agolliver 18 hours ago
  
  Edges are a separate table
  
  Reply View | 0 replies