Show HN: 22 GB of Hacker News in SQLite

yread 15 hours ago

I wonder how much smaller it could get with some compression. You could probably encode "This website hijacks the scrollbar and I don't like it" comments into just a few bits.

Reply View 5 replies

Rendello 14 hours ago

The hard-coded dictionary wouldn't be much stranger than Brotli's:
https://news.ycombinator.com/item?id=27160590

Reply View | 1 reply
- maxbond 6 hours ago
  
  You can use a BPE variant like SentencePiece to identify these patterns rather than hard coding them.
  
  Reply View | 0 replies
jacquesm 14 hours ago

That's at least 45%, then you can leave out all of my comments and you're left with only 5!

Reply View | 0 replies
hamburglar 9 hours ago

It might be a neat experiment to use ai to produce canonicalized paraphrasings of HN arguments so they could be compared directly and compress well.

Reply View | 0 replies
rossant 5 hours ago

Guilty.

Reply View | 0 replies

kamranjon 11 hours ago

It'd be great if you could add it to Kiwix[1] somehow (not sure what the process is for that but 100rabbits figured it out for their site) - I use it all the time now that I have a dumb phone - I have the entirety of wikipedia, wiktionary and 100rabbits all offline.

https://kiwix.org/en/

Reply View 3 replies

codazoda 8 hours ago

I love that you have 100r.ca on that short list.

Reply View | 0 replies
endofreach 7 hours ago

what dumb phone do you use?
and why do you want wikipedia in your pocket, but not a smartphone? where do you draw the line?
(doing a lot of work in that area, so i am asking to learn from someone who might think alike)

Reply View | 1 reply
- kamranjon 6 hours ago
  
  I use the Mudita Kompakt specifically cause it allows sideloading so I can still have a few extras. Right now I have Kiwix and Libby. It works really well.
  I have a $10 a month plan from US cellular with only 2gigs so I try to keep everything offline that I can.
  Honestly it's mostly the news... so I draw the line at browser, I'll never install a browser, that's basically something I can do when I sit down at a PC. I read quite a bit and I like to have the ability to look up a word or a historical event or some reference from something I read using Kiwix and it's been great for that, just needed to add a 512gb micro sd card. And Libby I just use at the gym when I'm on the treadmill.
  
  Reply View | 0 replies

zkmon 16 hours ago

Similar to Single-page applications (SPA), single-table application (STA) might become a thing. Just a shard a table on multiple keys and serve the shards as static files, provided that the data is Ok to share, similar to sharing static html content.

Reply View 4 replies

jhd3 15 hours ago

[The Baked Data architectural pattern](https://simonwillison.net/2021/Jul/28/baked-data/)

Reply View | 0 replies
jesprenj 16 hours ago

do you mean single database? it'd be quite hard if not impossible to make applications using a single table (no relations). reddit did it though, they have a huge table of "things" iirc.

Reply View | 2 replies
- mburns 15 hours ago
  
  That is a common misconception.
  > Next, we've got more than just two tables. The quote/paraphrase doesn't make it clear, but we've got two tables per thing. That means Accounts have an "account_thing" and an "account_data" table, Subreddits have a "subreddit_thing" and "subreddit_data" table, etc.
  https://www.reddit.com/r/programming/comments/z9sm8/comment/...
  
  Reply View | 1 reply
  
  rplnt 14 hours ago
  
  And the important lesson from that the k/v-like aspect of it. That the "schema" is horizontal (is that a thing?) and not column-based. But I actually only read it on their blog IIRC and never even got the full details - that there's still a third ID column. Thanks for the link.
  
  Reply View | 0 replies

kristianp 13 hours ago

I tried "select * from items limit 10" and it is slowly iterating through the shards without returning. I got up to 60 shards before I stopped. Selecting just one shard makes that query return instantly. As mentioned elsewhere I think duckdb can work faster by only reading the part of a parquet file it needs over http.

I was getting an error that the users and user_domains tables aren't available, but you just need to change the shard filter to the user stats shard.

Reply View 6 replies

piperswe 10 hours ago

Doesn't `LIMIT` just limit the amount of rows returned, rather than the amount read & processed?

Reply View | 3 replies
- SQLite an hour ago
  
  That depends on the query. SQLite tries to use LIMIT to restrict the amount of reading that it does. It is often successful at that. But some queries, by their very nature, logically require reading the whole input in order to compute the correct answer, regardless of whether or not there is a LIMIT clause.
  
  Reply View | 0 replies
- lucb1e 9 hours ago
  
  That's what it does, but if I'm not mistaken (at least in my experience with MariaDB) it'll also return immediately once it ran up to the limit and not try to process further rows. If you have an expensive subquery in the SELECT (...) AS `column_name`, it won't run that for every row before returning the first 10 (when using LIMIT 10) unless you ORDERed BY that column_name. Other components like the WHERE clause might also require that it reads every row before finding the ten matches. So mostly yes but not necessarily
  
  Reply View | 0 replies
- faxmeyourcode 6 hours ago
  
  The limit clause isn't official/standard ansi sql, so it's up to the rdbms to implement. Your assumption is true for bigquery (infamously) but not true for things like snowflake, duckdb, etc.
  
  Reply View | 0 replies
ncruces 12 hours ago

That's odd. If it was a VFS, that's not what I'd expect would happen. Maybe it's not a VFS?

Reply View | 1 reply
- keepamovin 5 hours ago
  
  What is a VFS?
  
  Reply View | 0 replies

Xyra 5 hours ago

Similar in spirit to a recent tool I recently posted Show HN on, https://exopriors.com/scry. You can use Claude Code to SQL+vector query HackerNews and many other high quality public commons sites, exceptionally well-indexed and usually 5+ minute query timeout limits, so you can run seriously large research queries, to rapidly refine your worldview (particular because you can do easily to EXHAUSTIVE exploration).

visarga 6 minutes ago

I like your concept of indexing high quality sources for RAG. For many queries we might not need the usual search engines.

Reply View | 0 replies

carbocation 17 hours ago

That repo is throwing up a 404 for me.

Question - did you consider tradeoffs between duckdb (or other columnar stores) and SQLite?

Reply View 19 replies

keepamovin 17 hours ago

No, I just went straight to sqlite. What is duckdb?

Reply View | 6 replies
- simonw 16 hours ago
  
  One interesting feature of DuckDB is that it can run queries against HTTP ranges of a static file hosted via HTTPS, and there's an official WebAssembly build of it that can do that same trick.
  So you can dump e.g. all of Hacker News in a single multi-GB Parquet file somewhere and build a client-side JavaScript application that can run queries against that without having to fetch the whole thing.
  You can run searches on https://lil.law.harvard.edu/data-gov-archive/ and watch the network panel to see DuckDB in action.
  
  Reply View | 1 reply
  
  keepamovin 8 hours ago
  
  In that case, then using duckdb might be even more performant than using what we’re doing here.
  It would be an interesting experiment to add the duckdb hackend
  
  Reply View | 0 replies
- fsiefken 17 hours ago
  
  DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It's designed to provide high performance on complex queries against large databases in embedded configuration.
  It has transparent compression built-in and has support for natural language queries. https://buckenhofer.com/2025/11/agentic-ai-with-duckdb-and-s...
  "DICT FSST (Dictionary FSST) represents a hybrid compression technique that combines the benefits of Dictionary Encoding with the string-level compression capabilities of FSST. This approach was implemented and integrated into DuckDB as part of ongoing efforts to optimize string storage and processing performance." https://homepages.cwi.nl/~boncz/msc/2025-YanLannaAlexandre.p...
  
  Reply View | 0 replies
- cess11 17 hours ago
  
  It is very similar to SQLite in that it can run in-process and store its data as a file.
  It's different in that it is tailored to analytics, among other things storage is columnar, and it can run off some common data analytics file formats.
  
  Reply View | 0 replies
- 1vuio0pswjnm7 13 hours ago
  
  "What is duckdb?"
  duckdb is a 45M dynamically-linked binary (amd64)
  sqlite3 1.7M static binary (amd64)
  DuckDB is a 6yr-old project
  SQLite is a 25yr-old project
  
  Reply View | 1 reply
  
  1vuio0pswjnm7 10 hours ago
  
  I like SQLite
  
  Reply View | 0 replies
jacquesm 14 hours ago

Maybe it got nuked by MS? The rest of their repo's are up.

Reply View | 3 replies
- keepamovin 5 hours ago
  
  Hey jacquesm! No, I just forgot to make it public.
  BUT I did try to push the entire 10GB of shards to GitHub (no LFS, no thanks, money), and after the 20 minutes compressing objects etc, "remote hang up unexpectedly"
  To be expected I guess. I did not think GH Pages would be able to do this. So have been repeating:
  wrangler pages deploy docs --project-name static-news --commit-dirty=true
  on changes and first time CF Pages user here, much impressed!
  
  Reply View | 2 replies
  
  jacquesm 4 hours ago
  
  Pretty neat project. I never thought you could do this in the first place, very much inspiring. I've made a little project that stores all of its data locally but still runs in the browser to protect against take downs and because I don't think you should store your precious data online more than you have to, eventually it all rots away. Your project takes this to the next level.
  
  Reply View | 1 reply
  
  keepamovin 2 hours ago
  
  Thanks, bud, that means a lot! Would like to see your versions of the data stored offline idea, it's very cool.
  
  Reply View | 0 replies
keepamovin 8 hours ago

i forgot to set repo to public. Fixed now

Reply View | 0 replies
3eb7988a1663 17 hours ago

While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.

Reply View | 1 reply
- peheje 3 hours ago
  
  the data is dominated by big unique TEXT columns, unsure how that can much compress better when grouped - but would be interesting to know
  
  Reply View | 0 replies
linhns 17 hours ago

Not the author here. I’m not sure about DuckDB, but SQLite allows you to simply use a file as a database and for archiving, it’s really helpful. One file, that’s it.

Reply View | 4 replies
- cobolcomesback 17 hours ago
  
  DuckDB does as well. A super simplified explanation of duckdb is that it’s sqlite but columnar, and so is better for analytics of large datasets.
  
  Reply View | 3 replies
  
  formerly_proven 17 hours ago
  
  The schema is this: items(id INTEGER PRIMARY KEY, type TEXT, time INTEGER, by TEXT, title TEXT, text TEXT, url TEXT
  Doesn't scream columnar database to me.
  
  Reply View | 2 replies

WadeGrimridge an hour ago

threw some heatmaps together of post volume and average score by day and time (15min intervals)

story volume (all time): https://ibb.co/pBTTRznP

average score (all time): https://ibb.co/KcvVjx8p

story volume (since 2020): https://ibb.co/cKC5d7Pp

average score (since 2020): https://ibb.co/WpN20kfh

[1] https://github.com/Paul-E/Pushshift-Importer

WadeGrimridge an hour ago

added median too.
median score (all time): https://ibb.co/gZV5QVMG
median score (since 2020): https://ibb.co/Gfv8T7k8

Reply View | 0 replies
setnone an hour ago

can confirm, i'm usually very generous with upvotes on sundays at noon

Reply View | 0 replies

Paul-E 17 hours ago

That's pretty neat!

I did something similar. I build a tool[1] to import the Project Arctic Shift dumps[2] of reddit into sqlite. It was mostly an exercise to experiment with Rust and SQLite (HN's two favorite topics). If you don't build a FTS5 index and import without WAL (--unsafe-mode), import of every reddit comment and submission takes a bit over 24 hours and produces a ~10TB DB.

SQLite offers a lot of cool json features that would let you store the raw json and operate on that, but I eschewed them in favor of parsing only once at load time. THat also lets me normalize the data a bit.

I find that building the DB is pretty "fast", but queries run much faster if I immediately vacuum the DB after building it. The vacuum operation is actually slower than the original import, taking a few days to finish.

[2] https://github.com/ArthurHeitmann/arctic_shift/blob/master/d...

Reply View 3 replies

Xyra 3 hours ago

Holy cow, I didn't know getting reddit was that straightforward. I am building public readonly-SQL+vector databases optimized for exploring high-quality public commons with Claude Code (https://exopriors.com/scry), I so cannot wait until some funding source comes in and I can upgrade to a $1500/month Hetzner server and pay the ~$1k to embed all that.

Reply View | 0 replies
s_ting765 16 hours ago

You could check out SQLite's auto_vacuum which reclaims space without rebuilding the entire db https://sqlite.org/pragma.html#pragma_auto_vacuum

Reply View | 1 reply
- Paul-E 12 hours ago
  
  I haven't tested that, so I'm not sure if it would work. The import only inserts rows, it doesn't delete, so I don't think that is the cause of fragmentation. I suspect this line in the vacuum docs:
  > The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY.
  means SQLite does something to organize by rowid and that this is doing most of the work.
  Reddit post/comment IDs are 1:1 with integers, though expressed in a different base that is more friendly to URLs. I map decoded post/comment IDs to INTEGER PRIMARY KEYs on their respective tables. I suspect the vacuum operation sorts the tables by their reddit post ID and something about this sorting improves tables scans, which in turn helps building indices quickly after standing up the DB.
  
  Reply View | 0 replies

m-p-3 13 hours ago

Looks like the repo was taken down (404).

That's too bad, I'd like to see the inner-working with a subset of data, even with placeholders for the posts and comments.

Reply View 7 replies

3abiton 13 hours ago

That was fast. I was looking into recent HN datasets, and they are impossible find.

Reply View | 4 replies
- xnx 13 hours ago
  
  Complete and continuously updated: https://play.clickhouse.com/play?user=play#U0VMRUNUIG1heCh0a...
  
  Reply View | 2 replies
  
  gettingoverit 9 hours ago
  
  If the last story on HN was at December 26, that is.
  
  Reply View | 1 reply
  
  rolymath 5 hours ago
  
  Continuously updated != instantly updated
  
  Reply View | 0 replies
- scsh 9 hours ago
  
  It's available on BigQuery and is updated frequently enough(daily I think).
  
  Reply View | 0 replies
octoberfranklin 8 hours ago

But why would they take it down?

Reply View | 1 reply
- keepamovin 8 hours ago
  
  Sorry i just forgot to set it to public! It’s there now
  
  Reply View | 0 replies

RyJones 2 hours ago

Neat. I keep wanting to build something like this for GitHub audit logs, but at ~5 tb, probably a little much

Sn0wCoder 14 hours ago

Site does not load on Firefox console error says 'Uncaught (in promise) TypeError: can't access property "wasm", sqlite3 is null'

Guess its common knowledge that SharedArrayBuffer (SQLite wasm) does not work with FF due to Cross-Origin Attacks (i just found out ;).

Once the initial chunk of data loads the rest load almost instantly on Chrome. Can you please fix the GitHub link (current 404) would like to peak at the code. Thank you!

keepamovin 8 hours ago

Damn. Will try to fix for FF.
edit: I just tested with FF latest, seems to be working.

Reply View | 0 replies
[removed] 14 hours ago

[deleted]

Reply View | 0 replies

sieep 16 hours ago

What a reminder on how text is so much more efficient than video, its crazy! Could you imagine the same amount of knowledge (or dribble) but in video form? I wonder how large that would be.

Reply View 9 replies

jacquesm 14 hours ago

That's what's so sad about youtube. 20 minute videos to encode a hundred words of usable content to get you to click on a link. The inefficiency is just staggering.

Reply View | 1 reply
- Rendello 13 hours ago
  
  Youtube can be excellent for explanations. A picture's worth a thousand words, and you can fit a lot of decent pictures in a 20 minute video. The signal-to-noise can be high, of course.
  
  Reply View | 0 replies
ivanjermakov 16 hours ago

Average high quality 1080p60 video has bitrate of 5Mbps, which is equivalent to 120k English words per second. With average English speech being 150wpm, we end up with text being 50 thousand times more space efficient.
Converting 22GB of uncompressed text into video essay lands us at ~1PB or 1000TB.

Reply View | 0 replies
keepamovin 8 hours ago

Right? 20 years, probably 10s millions of human hours of interactions, and it’s only as much as a couple DVDs.

Reply View | 0 replies
fsiefken 15 hours ago

one could use a video llm to generate the video, diagrams or the stills automatically based on the text. except when it's boardgames playthroughs or programming i just transcribe to text, summarise and read youtube video's.

Reply View | 4 replies
- deskamess 15 hours ago
  
  How do you read youtube videos? Very curious as I have been wanting to watch PDF's scroll by slowly on a large TV. I am interested in the workflow of getting a pdf/document into a scrolling video format. These days NotebookLM may be an option but I am curious if there is something custom. If I can get it into video form (mp4) then I can even deliver it via plex.
  
  Reply View | 1 reply
  
  fsiefken 11 hours ago
  
  I use yt-dlp to download the transcript, and if it's not available i can get the audio file and run it through parakeet locally. Then I have the plain text, which could be read out loud (kind of defeating the purpose), but perhaps at triple speed with a computer voice that's still understandble at that speed. I could also summarize it with an llm. With pandoc or typst I can convert to single column or mult column pdf to print or watch on tv or my smart glasses. If I strip the vowels and make the font smaller I can fit more!
  One could convert the Markdown/PDF to a very long image first with pandoc+wkhtml, then use ffmpeg to crop and move the viewport slowly over the image, this scrolls at 20 pixels per second for 30s - with the mpv player one could change speed dynamically through keys.
  ffmpeg -loop 1 -i long_image.png -vf "crop=iw:ih/10:0:t*20" -t 30 -pix_fmt yuv420p output.mp4
  Alternatively one could use a Rapid Serial Visual Presentation / Speedreading / Spritz technique to output to mp4 or use dedicated rsvp program where one can change speed.
  One could also output to a braille 'screen'.
  Scrolling mp4 text on the the TV or Laptop to read is a good idea for my mother and her macula degeneration, or perhaps I should make use of an easier to see/read magnification browser plugin tool.
  
  Reply View | 0 replies
- Barbing 15 hours ago
  
  Can be nice to pull a raw transcript and have it formatted as HTML (formatting/punctuation fixes applied).
  Best locally of course to avoid “I burned a lake for this?” guilt.
  
  Reply View | 1 reply
  
  fsiefken 11 hours ago
  
  yes, yt-dlp can download the transcript, and if it's not available i can get the audio file and run it through parakeet locally.
  
  Reply View | 0 replies

zX41ZdbW 17 hours ago

The query tab looks quite complex with all these content shards: https://hackerbook.dosaygo.com/?view=query

I have a much simpler database: https://play.clickhouse.com/play?user=play#U0VMRUNUIHRpbWUsI...

[0] https://kiwix.org/en/the-new-kiwix-library-is-available/

embedding-shape 17 hours ago

Does your database also runs offline/locally in the browser? Seems to be the reason for the large number of shards.

Reply View | 1 reply
- zX41ZdbW 8 hours ago
  
  You can run it locally, but it is a client-server architecture, which means that something has to run behind the browser.
  
  Reply View | 0 replies

abixb 17 hours ago

Wonder if you could turn this into a .zim file for offline browsing with an offline browser like Kiwix, etc. [0]

I've been taking frequent "offline-only-day" breaks to consolidate whatever I've been learning, and Kiwix has been a great tool for reference (offline Wikipedia, StackOverflow and whatnot).

keepamovin 7 hours ago

Oh that's a cool idea. If you want to take a crack at writing the script, the repo is open!

Reply View | 0 replies
Barbing 15 hours ago

Oh this should TOTALLY be available to those who are scrolling through sources on the Kiwix app!

Reply View | 0 replies

diyseguy 13 hours ago

link no workie: https://github.com/DOSAYGO-STUDIO/HackerBook

keepamovin 7 hours ago
Fixed now. Forgot to make public. I also added a script:
./toool/download-site.mjs --help
To let you download the entire site over HTTPS so you don't need to "build it" by running the pipeline.
That way it's truly offline.
Reply View | 0 replies

tevon 16 hours ago

The link seems to be down, was it taken down?

scsh 16 hours ago

Probably just forgot to make it public.

Reply View | 0 replies

modeless 10 hours ago

It's really a shame that comment scores are hidden forever. Would the admins consider publishing them after stories are old enough that voting is closed? It would be great to have them for archives and search indices and projects like this.

Reply View 4 replies

pilingual 8 hours ago

I wrote to hn@ and asked for this as a feature request:
"1. Delayed Karma Display. I understand why comment karma was hidden. I don't see the harm in un-hiding karma after some time. If not 24 hours, then 72-168 hours. This would help me read through threads with 1300 comments."
This was last January. While I asked for a few more features, it is the only one that seems essential as HN grows with massive threads.

Reply View | 0 replies
keepamovin 8 hours ago

Fear not. I have a collaborative project designed to address this.

Reply View | 2 replies
- vunderba 7 hours ago
  
  They're referring to scores on individual COMMENTS - this information isn't available via the HN Firebase API.
  The only way you could theoretically extract everyone's comment scores (at least the top level ones) would be like this if you're a complete madman:
  1. Wait 48 hours so the article is effectively dead
  2. Post a new comment using an account called ThePresident
  3. Create a swarm of a thousand shill user accounts called Voter1, Voter2, etc.
  4. Use a single account at a time and upvote ThePresident
  5. Recheck the page to see if ThePresident has moved above a user(s) post
  6. Record the score for that user and assign it to the tracked story's history
  7. Repeat from (4)
  
  Reply View | 1 reply
  
  keepamovin 7 hours ago
  
  I know that! I have a collaborative project to make it sort of available.
  But the idea I have is not like that at all - it's much nicer on everyone's ethics. Stay tuned! :)
  
  Reply View | 0 replies

3eb7988a1663 10 hours ago

Did anyone get a copy of this before it was pulled? If GitHub is not keen, could it be uploaded to HuggingFace or some other service which hosts large assets?

I have always known I could scrape HN, but I would much rather take a neat little package.

dspillett 14 hours ago

Is there a public dump of the data anywhere that this is based upon, or have they scraped it themselves?

Such as DB might be entertaining to play with, and the threadedness of comments would be useful for beginners to practise efficient recursive queries (more so than the StackExchange dumps, for instance).

thomasmarton 14 hours ago

While not a dump per se, there is an API where you can get HN data programmatically, no scraping needed.
https://github.com/HackerNews/API

Reply View | 0 replies
keepamovin 8 hours ago

Yes, you can see the download HN bash script in the repository now that simply extract the data to your local machine from BigQuery and saves it as a series of gzip JSON files

Reply View | 0 replies

spit2wind 14 hours ago

This is pretty neat! The calendar didn't work well for me. I could only seem to navigate by month. And when I selected the earliest day (after much tapping), nothing seemed to be updated.

Nonetheless, random access history is cool.

keepamovin 7 hours ago

Cna you let me know? I'm sure there's some weirdness lurking there and I want to smooth it out. Calendar is essential.

Reply View | 0 replies

yupyupyups 17 hours ago

1 hour passed and it's already nuked?

Thank you btw

fouc 9 hours ago

Suddenly occurs to me that it would be neat to pair a small LLM (3-7B) with an HN dataset

codazoda 8 hours ago

Does the SQLite version of this already exist somewhere? The github link on the footer of the page fails for me.

Reply View | 0 replies

dmarwicke 15 hours ago

22gb for mostly text? tried loading the site, it's pretty slow. curious how the query performance is with this much data in sqlite

layer8 14 hours ago

Apparently the comment counts are only the top-level comments?

It would be nice for the thread pages to show a comment count.

keepamovin 8 hours ago

Yes, because comments in a thread can span shards. It’s just a bit too heavy to add comment counts of an entire thread. So I give a low bound ha ha

Reply View | 0 replies

joshcsimmons 13 hours ago

Link appears broken

ra 12 hours ago

confirmed - I wonder what happened?

Reply View | 0 replies

wslh 17 hours ago

Is this updated regularly? 404 on GitHub as the other comment.

With all due respect it would be great if there is an official HN public dump available (and not requiring stuff such as BigQuery which is expensive).

scsh 9 hours ago

The BQ dataset is only ~17GB and the free tier of BQ lets you query 1TB per month. If you're not doing select * on every query you should be able to do a lot with that.

Reply View | 0 replies

KomoD 13 hours ago

How do I download it? That repo is a 404.

sirjaz 16 hours ago

This would be awesome as a cross platform app.

keepamovin 7 hours ago

Good idea. HN.exe

Reply View | 0 replies

DenisDolya 4 hours ago

Hahaha, now you can be prepared for the apocalypse when the internet disappears. ;)

solarized 13 hours ago

Beautiful !

2026 prayer: for all you AI junkies—please don’t pollute H/N with your dirty AI gaming.

Don’t bot posts, comments, or upvote/downvote just to maximize karma. Please.

We can’t identify anymore who’s a bot and who’s human. I just want to hang out with real humans here.

[removed] 19 hours ago

[deleted]

asdefghyk 18 hours ago

How much space is needed? ...for the data .... Im wondering if it would work on a tablet? ....

[0] https://www.ycombinator.com/legal/#tou

asdefghyk 9 hours ago

FYI I did NOT see the size info in the title. Impossible to edit / delete my comment now ........

Reply View | 0 replies
keepamovin 18 hours ago

~9GB gzipped.

Reply View | 0 replies

abetusk 13 hours ago

Alas, HN does not belong to us, and the existence of projects like this are subject to the whims of the legal owners of HN.

From the terms of use [0]:

"""

Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site. The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.

"""