Show HN: I used Claude Code to discover connections between 100 books
(trails.pieterma.es)469 points by pmaze a day ago
I think LLMs are overused to summarise and underused to help us read deeper.
I built a system for Claude Code to browse 100 non-fiction books and find interesting connections between them.
I started out with a pipeline in stages, chaining together LLM calls to build up a context of the library. I was mainly getting back the insight that I was baking into the prompts, and the results weren't particularly surprising.
On a whim, I gave CC access to my debug CLI tools and found that it wiped the floor with that approach. It gave actually interesting results and required very little orchestration in comparison.
One of my favourite trail of excerpts goes from Jobs’ reality distortion field to Theranos’ fake demos, to Thiel on startup cults, to Hoffer on mass movement charlatans (https://trails.pieterma.es/trail/useful-lies/). A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset.
Details:
* The books are picked from HN’s favourites (which I collected before: https://hnbooks.pieterma.es/).
* Chunks are indexed by topic using Gemini Flash Lite. The whole library cost about £10.
* Topics are organised into a tree structure using recursive Leiden partitioning and LLM labels. This gives a high-level sense of the themes.
* There are several ways to browse. The most useful are embedding similarity, topic tree siblings, and topics cooccurring within a chunk window.
* Everything is stored in SQLite and manipulated using a set of CLI tools.
I wrote more about the process here: https://pieterma.es/syntopic-reading-claude/
I’m curious if this way of reading resonates for anyone else - LLM-mediated or not.
I did something similar whereby I used pdfplumber to extract text from my pdf book collection. I dumped it into postgresql, then chunked the text into 100 char chunks w/ a 10 char overlap. These chunks were directly embedded into a 384D space using python sentence_transformers. Then I simply averaged all chunks for a doc and wrote that single vector back to postgresql. Then I used UMAP + HDBScan to perform dimensionality reduction and clustering. I ended up with a 2D data set that I can plot with plotly to see my clusters. It is very cool to play with this. It takes hours to import 100 pdf files but I can take one folder that contains a mix of programming titles, self-help, math, science fiction etc. After the fully automated analysis you can clearly see the different topic clusters.
I just spent time getting it all running on docker compose and moved my web ui from express js to flask. I want to get the code cleaned up and open source it at some point.