Comment by masterphai

Comment by masterphai 8 days ago

4 replies

Interesting project - it’s rare to see news-flow tracking done in real time at this scale. One thing you may want to stress-test is how stable the clustering remains when stories evolve semantically over a few hours. Embeddings tend to drift as outlets rewrite or localize a piece, and HNSW can sometimes over-merge when the centroid shifts.

A trick that helped in a similar system I built was doing a second-pass “temporal coherence” check: if two articles are close in embedding space but far apart in publish time or share no common entities, keep them in adjacent clusters rather than forcing a merge. It reduced false positives significantly.

Also curious how you handle deduping syndicated content - AP/Reuters can dominate the embedding space unless you weight publisher identity or canonical URLs.

Overall, really nice work. The propagation timeline is especially useful.

supriyo-biswas 3 days ago

Thanks for your comment, unfortunately it seems that your comments are primarily LLM-generated (for people looking for evidence, the first comments of this user should provide enough evidence, although they’re getting better by fine tuning the prompt). As HN is primarily a place for humans, please do not do this here. Thanks.

  • nextaccountic 3 days ago

    this apecific comment shows no sign of LLM authorship

    maybe the author uses LLMs in some comments and not others. that is, it's not a bot, just someone manually using LLM tools sometimes

  • yieldcrv 3 days ago

    How can I bait this bot?

    • alchemist1e9 3 days ago

      The style of the account comments and “about” definitely give off LLM vibes, but it’s not a particularly active account so I feel not a true bot. It’s also possible the account owner just runs their own comment through an LLM before posting it. I do that for most business emails I send these days but they are still reflecting my own thoughts and details.