Search tool that only returns content created before ChatGPT's public release

878 points by dmitrygr 2 days ago

> This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.

The problem is that Google's search engine - but, oddly enough, ALL search engines - got worse before that already. I noticed that search engines got worse several years before 2022. So, AI further decreased the quality, but the quality had a downwards trend already, as it was. There are some attempts to analyse this on youtube (also owned by Google - Google ruins our digital world); some explanations made sense to me, but even then I am not 100% certain why Google decided to ruin google search.

One key observation I made was that the youtube search, was copied onto Google's regular search, which makes no sense for google search. If I casually search for a video on youtube, I may be semi-interested in unrelated videos. But if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information. This is not the only example, Google made the search results worse here and tries to confuse the user in clicking on things. Plus placement of ads. The quality really worsened.

Reply View 133 replies

justinclift 2 days ago

Are you aware of Kagi (kagi.com)?
With them, at least the AI stuff can be turned off.
Membership is presently about 61k, and seems to be growing about 2k per month: https://kagi.com/stats

Reply View | 92 replies
- amelius 2 days ago
  
  Be aware of:
  https://www.reddit.com/r/SearchKagi/comments/1gvlqhm/disappo...
  
  Reply View | 80 replies
  
  phantasmish 2 days ago
  
  I directly use Yandex sometimes, because there are huge blind spots for all the US-based engines I'm aware of, and it fills some of them in.
  If someone can point me to a better index for that purpose, I'd love to avoid Yandex. Please inform me.
  
  Reply View | 0 replies
  
  smusamashah 2 days ago
  
  There are few other powerful countries, with countless Web services, who freely wages war(s) on other countries and support wars in many different ways. Is there a way to avoid their products?
  
  Reply View | 20 replies
  
  Ferret7446 2 days ago
  
  I find this amusing, because it seems like Kagi's target audience dislikes this (politically polarized), and I as someone who is not Kagi's target audience likes this (politically neutral).
  
  Reply View | 11 replies
  
  xzjis 21 hours ago
  
  I don't like defending Russia which is a horrible country, but I find it hypocritical to only talk about their imperialism and pretend not to see that the most imperialist country in the world, the one that has started, financed, and participated in the most wars, is the United States, and yet the question of boycotting American companies is never brought up. Google has been intentionally sabotaged in terms of image search and reverse image search; Yandex is literally the best on the market, but Kagi should boycott them because their headquarters are in the wrong country?
  
  Reply View | 0 replies
  
  super256 2 days ago
  
  Yandex has the best image search, and others are years behind it. Further more Nebius has sold all group’s businesses in Russia and certain international market. They are completely divested from Russia for a 1.5 years already: https://nebius.com/newsroom/ynv-announces-successful-complet...
  The post you linked was posted when the divestment was already going underway, so it is at least dishonest if not malicious.
  
  Reply View | 13 replies
  
  duxup 2 days ago
  
  Yeah I kept thinking "man I should try kagi" and then that :(
  
  Reply View | 6 replies
  
  justinclift 2 days ago
  
  Damn. I didn't know that.
  Now we need a 2nd Kagi, so we can switch to that one instead. :(
  
  Reply View | 0 replies
  
  eirini1 2 days ago
  
  I don't agree with this logic. It implies that people who use Google, Bing and a million other products made by US-based companies are supportive of the huge amount of attrocities commited or aided by the United States. Or other countries. It feels very odd to single out Russia's invasion of Ukraine but to minimize the Israeli genocide of palestinians in Gaza, the multiple unjust wars waged by the United States all over the world etc.
  
  Reply View | 5 replies
  
  buellerbueller 2 days ago
  
  Imo, Kagi is still the better option, because it isn't supporting the global surveillance mechanism we call advertising. All these people, missing the forest for the single yandex tree.
  
  Reply View | 0 replies
  
  troyvit 2 days ago
  
  So if America invades Venezuela should we all stop using google? Should we have stopped using google when the U.S. invaded Iraq and killed 150,000 people[1]?
  Should we stop using products imported from China for the cultural genocide they've perpetrated against the Uyghurs?[2]
  Is Yandex Russia?
  [1] https://en.wikipedia.org/wiki/Casualties_of_the_Iraq_War
  [2] https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in_Chin...
  
  Reply View | 6 replies
  
  scotty79 2 days ago
  
  > "We do not discriminate based on current geopolitical issues."
  That's one way of phrasing it.
  
  Reply View | 0 replies
  
  spIrr 2 days ago
  
  Thank you. Didn't know that and was, until now, considering paying for a Kagi subscription.
  
  Reply View | 0 replies
  
  Seattle3503 2 days ago
  
  I'm surprised this is possible given the sanctions on Russia.
  
  Reply View | 0 replies
  
  immibis 2 days ago
  
  Why's that something to be aware of? Yandex is actually a good search engine, so I'm told, as long as you don't search for things related to Russian politics. Kagi presumably knows this and won't use their results related to Russian politics.
  Feels more like a scare campaign to me - someone doesn't want you to use Kagi, and points to Yandex as a reason for that.
  
  Reply View | 0 replies
  
  devmor 2 days ago
  
  Kagi is based in the United States, as is YC.
  If you are concerned about heinous war crimes and the slaughter of civilians to the point that you don't want to use private services from countries that conduct such acts, you should avoid both already.
  
  Reply View | 0 replies
  
  stronglikedan 2 days ago
  
  Meh. Most people, including myself, couldn't care less, and Yandex image search is very capable.
  
  Reply View | 0 replies
  
  DontForgetMe a day ago
  
  I remain amazed by the lack of attention given to this.
  Regardless of one's position on the 'everything online is Russian propaganda, Russian bots or misinformation - invest in sickles and hammers, comrade / wtf just use basic common sense and the internet is as safe as it ever was' continuum, such universal enthusiasm for a Russian-owned, Russian-controlled search engine should generate a little more counter-argument, at the very least.
  Absolutely no mention of Google, Bing, Startpage, DDG, or even Mojeek search engines usually pass online without somebody detailing the problems, flaws, or why they're not as good as the alternatives. Usually, at least 20% of the comments will be overtly critical, with at least 1 person passionately arguing that this search engine is going to destroy life as we know it / funds genocide / is an abomination unto God.
  On open forums and spaces where a variety of users and tastes are represented, that minimum level of criticism usually applies to absolutely everything from movies to toothbrushing techniques to kids' TV to low-carb breakfasts. If more than 3 people care enough about something to discuss it, at least 1 of those people will hate it and feel the need to enunciate why.
  Except Kagi. Kagi must enjoy the highest praise-criticism ratio of anything I've ever seen on the web, including concepts like sunshine and heaven and the eradication of polio.
  Seriously. The only 'real' criticism I ever see of Kagi is like 'I personally don't like it because I don't think a search engine is worth more than $19.99' or 'unfortunately I need x feature', and it's always followed by a reply saying 'Ah, well Kagi is now available for $19.50' or 'you'll be thrilled to know that x feature can be enabled in Kagi by following these steps'.
  And the occasional 'I don't use it because it seemed a bit wierd and wasn't worth it' comment languishing on the outskirts of the discussion.
  So yeah. I do not expect this comment to stir much discussion, mainly because it's like 24 hours after the main debate and is on a pretty low-impact thread on hacker news from an uninspiring new ish account. But also because Kagi critical comments are written in sand, whatever the discussion or authority or audience.
  That should make people more suspicious.
  
  Reply View | 1 reply
  
  justinclift 15 hours ago
  
  > But also because Kagi critical comments are written in sand, whatever the discussion or authority or audience.
  Maybe people just turn up too late and their comments generally aren't seen?
  
  Reply View | 0 replies
  
  artursapek 2 days ago
  
  based Vlad tbh
  
  Reply View | 0 replies
- tempacct2cmmnt 2 days ago
  
  I’ve had much better results with Kagi than with Google in the past few months. I’d trialed them a couple times in the past and been disappointed, but that’s no longer the case.
  
  Reply View | 0 replies
- PaulDavisThe1st 2 days ago
  
  The AI stuff in google search can be turned off.
  https://www.google.com/search?udm=14&q=kagi
  My default browser search tool is set to google with ?udm=14 automatically appended.
  
  Reply View | 2 replies
  
  nailer 2 days ago
  
  What is UDM? Presumably the U is Urchin but what’s the rest?
  
  Reply View | 1 reply
  
  PaulDavisThe1st 2 days ago
  
  Never seen it documented.
  
  Reply View | 0 replies
- mebizzle 2 days ago
  
  Haven't looked back since I signed up.
  
  Reply View | 0 replies
- dncornholio 2 days ago
  
  How does Kagi know what is AI stuff? I don't see how they can 'just turn it off'
  
  Reply View | 4 replies
  
  justinclift 2 days ago
  
  By "turn it off" I mostly mean that Kagi have their own AI driven tools available, but a toggle in your user settings disables it completely.
  ie it's not forced down your throat, nor mysteriously/accidentally/etc turned back on occasionally
  
  Reply View | 0 replies
  
  Zambyte 2 days ago
  
  It's driven by community ratings.
  https://news.ycombinator.com/item?id=45919067
  
  Reply View | 2 replies
- vivzkestrel a day ago
  
  what if there was an open source search engine that contributors kept making better but it was a paid subscription tool?
  
  Reply View | 0 replies
Maken 2 days ago

There is also the fact that automatically generated content predates ChatGPT by a lot. By around 2020 most Google searches already returned lots of SEO-optimized pages made from scrapped content or keyword soups made by rudimentary language models or markov chains.

Reply View | 5 replies
- black3r 2 days ago
  
  Well there's also the fact that GPT-3 API was released in June 2020 and its writing capabilities were essentially on par with ChatGPT initial release. It was just a bit harder to use, because it wasn't yet trained to follow instructions, it only worked as a very good "autocomplete" model, so prompting was a bit "different" and you couldn't do stuff like "rewrite this existing article in your own words" at all, but if you just wanted to write some bullshit SEO spam from scratch it was already as good as ChatGPT would be 2 years later.
  
  Reply View | 2 replies
  
  wongarsu 2 days ago
  
  Also the full release of GPT-2 in late 2019. While GPT-2 wasn't really "good" at writing, it was more than good enough to make SEO spam
  
  Reply View | 0 replies
  
  Maken 2 days ago
  
  I didn't remember that, but it would explain the spam exponential grow back then.
  
  Reply View | 0 replies
- gield 2 days ago
  
  And 10 years ago, Reddit was already experimenting with auto-generated subreddits: https://www.reddit.com/r/SubredditSimulator.
  
  Reply View | 0 replies
- PunchyHamster 2 days ago
  
  It was popular way before 2020 but Google managed to keep up with SEO tricks for good decade+ before. Guess it got to breaking point.
  
  Reply View | 0 replies
robot-wrangler 2 days ago

> Google made the search results worse here
Did you mean:
worse results near me
are worse results worth it
worse results net worth
best worse results
worse results reddit

Reply View | 4 replies
- d-lisp 2 days ago
  
  search: Emacs
  Did you mean vim ? (vice-versa)
  
  Reply View | 2 replies
  
  ganzsz 2 days ago
  
  Tbh, this sounds like a Google Easter egg.
  
  Reply View | 1 reply
  
  mghackerlady 2 days ago
  
  Because it is
  
  Reply View | 0 replies
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
benterix 2 days ago

> if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information
You assume the aim here is for you to find relevant information, not increase user retention time. (I just love the corporate speak for making people's lives worse in various ways.)

Reply View | 1 reply
- mcv 2 days ago
  
  You finding relevant information used to be the aim. Enshittification started when they let go of that aim.
  
  Reply View | 0 replies
master-lincoln 2 days ago

I think this is about trustworthy content, not about a good search engine per se

Reply View | 2 replies
- trinix912 2 days ago
  
  But it's not necessarily trustworthy content, we had autogenerated listicles and keyword list sites before ChatGPT.
  
  Reply View | 1 reply
  
  GTP 2 days ago
  
  Sure, but I think that the underlying assumption is that, after the public release of ChatGPT, the amount of autogenerated content on the web became significantly bigger. Plus, the auto-generated content was easier to spot before.
  
  Reply View | 0 replies
zipy124 2 days ago

Honestly the biggest failing is just SEO spam sites got too good at defeating the algorithm. The amount of bloody listicles or quora nonsense or backlink farming websties that come up in search is crazy.

Reply View | 12 replies
- duxup 2 days ago
  
  I feel like google gave up the fight at some point. I think HN had some good articles that indicated that.
  
  Reply View | 1 reply
  
  strbean 2 days ago
  
  Certainly seems that way if you observed the waves of usability Google search underwent in the first 15 years. There was several distinct cycles where the results were great, then garbage, then great again. They would be flooded with SEO spam, then they would tweak and penalize the SEO spam heavily, then SEO would catch up.
  The funny thing is that it seems like when they gave up it wasn't because some new advancement in the arms race. It was well before LLMs hit the scene. The SEO spam was still incredibly obvious to a human reader. Really seems like some data-driven approach demonstrated that surrendering on this front led to increased ad revenue.
  
  Reply View | 0 replies
- AznHisoka 2 days ago
  
  For most commercial related terms, I suspect if you got rid of all “spanmy” results you would be left with almost nothing. No independent blogger is gonna write about the best credit card with travel points.
  
  Reply View | 5 replies
  
  eszed 2 days ago
  
  I agree with your point, but you picked a poor example. Have you met any credit reward min-maxers?
  
  Reply View | 0 replies
  
  strbean 2 days ago
  
  Sites like Credit Karma / NerdWallet exist. While I think they are rife with affiliate link nonsense and paid promotion masquerading as advice, I'm also pretty sure they have paid researchers and writers generating genuine content. Not sure that quite falls into the bucket of SEO blogspam.
  
  Reply View | 1 reply
  
  asdff 2 days ago
  
  It still counts because they would only ever recommend affiliate partnered products.
  
  Reply View | 0 replies
  
  baconbrand 2 days ago
  
  I had a coworker who kept up a blog about random purchases she’d made, where she would earn some money via affiliate links. I thought it was horrendously boring and weird, and the money made was basically pocket change, but she seemed to enjoy it. You might be surprised, people write about all sorts of things.
  
  Reply View | 1 reply
  
  asdff 2 days ago
  
  People used to do it early internet before affiliate marketing really took it over. Certainly it was more genuine and products were bemoaned for their compromises in one dimension as much as praised for their performance in another. Everything is a glowing review now and comparisons are therefore meaningless.
  
  Reply View | 0 replies
- Nextgrid 2 days ago
  
  This is bullshit the search engines want you to believe. It's trivial to detect sites that "defeat" the algorithm; you simply detect their incentives (ads/affiliate links) instead.
  Problem is that no mainstream search engine will do it because they happen to also be in the ad business and wouldn't want to reduce their own revenue stream.
  
  Reply View | 0 replies
- watwut 2 days ago
  
  Afaik they did not lost the fight. They stopped trying, because it was good for short term earnings
  
  Reply View | 1 reply
  
  masfuerte 2 days ago
  
  Yes, this is true. It was revealed in Google emails released during antitrust hearings. Google absolutely made a deliberate decision to enshittify their search results for short term gains.
  Though maybe it's a long term gain. I know many normal (i.e. non-IT) people who've noticed the poor search results, yet they continue to use Google search.
  
  Reply View | 0 replies
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
codyb 2 days ago

I've been using DuckDuckGo for the last... decade or so. And it still seems to return fairly relevant documentation towards the top.
To be fair, that's most of what I use search for these days is "<<Programming Language | Tool | Library | or whatever>> <<keyword | function | package>>" then navigate to the documentation, double check the versions align with what I'm writing software in, read... move on.
Sometimes I also search for "movie showtimes nyc" or for a specific venue or something.
So maybe my use cases are too specific to screw up, who knows. If not, maybe DDG is worth a try.

Reply View | 1 reply
- geldedus 2 days ago
  
  DuckDuckGo uses Bing search results.
  
  Reply View | 0 replies
123malware321 2 days ago

ML and AI killed it between 2011-2016 somewhere. https://en.wikipedia.org/wiki/Dead_Internet_theory

Reply View | 0 replies
jollyllama 2 days ago

> The problem
That's a separate problem. The search algorithm applied on top of the underlying content is a separate problem from the quality or origin of the underlying content, in aggregate.

Reply View | 0 replies
groundzeros2015 2 days ago

Significant changes were made to Google and YouTube in 2016 and 2017 in response to the US election. The changes provided more editorial and reputation based filtering, over best content matching.

Reply View | 0 replies
xnx 2 days ago

Counterpoint: The experience of quickly finding succinct accurate responses to queries has never been better.
Years ago, I would consider a search "failed" if the page with related information wasn't somewhere in the top 10. Now a search is "failed" if the AI answer doesn't give me exactly what I'm looking for directly.

Reply View | 0 replies
0xEF 2 days ago

> I am not 100% certain why Google decided to ruin google search.
Ask Prabhakar Raghavan. Bet he knows.

Reply View | 0 replies
ForHackernews 2 days ago

Goodhart's law applies to links, too. Google monetized them and destroyed their value as a signal.

Reply View | 0 replies
juujian 2 days ago

The problem is that before Nov 30, 2022 we also had plenty of human-generated slop bearing down on the web. SEO content specifically.

Reply View | 0 replies
bratwurst3000 2 days ago

the main theory is that with bad results you have to search more and get more engaged in ads so more revenue for google. Its enshitification

Reply View | 0 replies
salemh 2 days ago

[dead]

Reply View | 0 replies

swyx 2 days ago

somebody said once we are mining "low-background tokens" like we are mining low-background (radiation) steel post WW2 and i couldnt shake the concept out of my head

(wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)

Reply View 49 replies

mwidell 2 days ago

Low background steel is no longer necessary.
"...began to fall in 1963, when the Partial Nuclear Test Ban Treaty was enacted, and by 2008 it had decreased to only 0.005 mSv/yr above natural levels. This has made special low-background steel no longer necessary for most radiation-sensitive uses, as new steel now has a low enough radioactive signature."
https://en.wikipedia.org/wiki/Low-background_steel

Reply View | 11 replies
- juvoly 2 days ago
  
  Interesting. I guess that analogously, we might find that X years after some future AI content production ban, we could similarly start ignoring the low background token issue?
  
  Reply View | 9 replies
  
  actionfromafar 2 days ago
  
  We used a rather low number of atmospheric bombs, while we are carpet bombing the internet every day with AI marketing copy.
  
  Reply View | 6 replies
  
  piker 2 days ago
  
  What’s the half-life of a viral meme?
  
  Reply View | 0 replies
  
  huflungdung 2 days ago
  
  [dead]
  
  Reply View | 0 replies
- doe88 2 days ago
  
  Can't wait, in fifty years we will have our data clean again.
  
  Reply View | 0 replies
alansaber 2 days ago

Since synthetic data for training is pretty ubiquitous seems like a novelty

Reply View | 0 replies
jeffchuber 2 days ago

that was me swyx

Reply View | 6 replies
- rollulus 2 days ago
  
  Multiple people have coined the idea repeatedly, way before you. The oldest comment on HN I could find was in December 2022 by user spawarotti: https://news.ycombinator.com/item?id=33856172
  
  Reply View | 5 replies
  
  threeducks 2 days ago
  
  Here is an even older comment chain about it from 2020: https://news.ycombinator.com/item?id=23895706
  Apparently, comparing low-background steel to pre-LLM text is a rather obvious analogy.
  
  Reply View | 2 replies
  
  jeffchuber 2 days ago
  
  i didnt claim to invent it.
  i claimed swyx heard it through me - which he did
  
  Reply View | 1 reply
  
  swyx 2 days ago
  
  you did!!
  
  Reply View | 0 replies
jrjfjgkrj 2 days ago

every human generation built upon the slop of the previous one
but we appreciated that, we called it "standing on the shoulders of giants"

Reply View | 28 replies
- bigiain 2 days ago
  
  > we called it "standing on the shoulders of giants"
  We do not see nearly so far though.
  Because these days we are standing on the shoulders of giants that have been put into a blender and ground down into a slippery pink paste and levelled out to a statistically typical 7.3mm high layer of goo.
  
  Reply View | 3 replies
  
  _kb 2 days ago
  
  The secret is you then have to heat up that goo. When the temperature gets high enough things get interesting again.
  
  Reply View | 2 replies
- shevy-java 2 days ago
  
  This sounds like an Alan Kay quote. He meant that in regards to useful inventions. AI-generated spam just decreases the quality. We'd need a real alternative to this garbage from Google but all the other search engines are also bad. And their UI is also horrible - not as bad as Google, but also bad. Qwant just tries to copy/paste Google for instance (though interestingly enough, sometimes it has better results than Google - but also fewer in general, even ignornig false positive results).
  
  Reply View | 1 reply
  
  visarga 2 days ago
  
  Deep Research reports I think are above average internet quality, they collect hundreds of sources, synthesize and contrast them & provide backlinks. Almost like a generative wikipedia.
  I think all we can expect from internet information is a good description of the distribution of materials out there, not truth. This is totally within the capabilities of LLMs. For additional confidence run 3 reports on different models.
  
  Reply View | 0 replies
- groestl 2 days ago
  
  We have two optimization mechanisms though which reduce noise with respect to their optimization functions: evolution and science. They are implicitly part of "standing on the shoulders of giants", you pick the giant to stand on (or it is picked for you).
  Whether or not the optimization functions align with human survival, and thus our whole existence is not a slop, we're about to find out.
  
  Reply View | 0 replies
- rebuilder 2 days ago
  
  That's because the things we built on weren't slop
  
  Reply View | 0 replies
- kgwgk 2 days ago
  
  Nothing conveys better the idea of a solid foundation to build upon than the word ‘slop’.
  
  Reply View | 1 reply
  
  DeepSeaTortoise a day ago
  
  Every foundation needs some time to settle.
  - Sir, this is an elevator.
  
  Reply View | 0 replies
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
- pseidemann 2 days ago
  
  You may have one point.
  The industrial age was built on dinosaur slop, and they were giant.
  
  Reply View | 0 replies
- ben_w 2 days ago
  
  There's a reason this is comedy:
  Listen, lad. I built this kingdom up from nothing. When I started here, all there was was swamp. Other kings said I was daft to build a castle on a swamp, but I built it all the same, just to show 'em. It sank into the swamp. So, I built a second one. That sank into the swamp. So, I built a third one. That burned down, fell over, then sank into the swamp, but the fourth one... stayed up! And that's what you're gonna get, lad: the strongest castle in these islands.
  While this is religious:
  [24] “Everyone then who hears these words of mine and does them will be like a wise man who built his house on the rock. [25] And the rain fell, and the floods came, and the winds blew and beat on that house, but it did not fall, because it had been founded on the rock. [26] And everyone who hears these words of mine and does not do them will be like a foolish man who built his house on the sand. [27] And the rain fell, and the floods came, and the winds blew and beat against that house, and it fell, and great was the fall of it.”
  Humans build not on each other's slop, but on each other's success.
  Capitalism, freedom of expression, the marketplace of ideas, democracy: at their best these things are ways to bend the wisdom of the crowds (such as it is) to the benefit of all; and their failures are when crowds are not wise.
  The "slop" of capitalism is polluted skies, soil and water, are wage slaves and fast fashion that barely lasts one use, and are the reason why workplace health and safety rules are written in blood. The "slop" of freedom of expression includes dishonest marketing, libel, slander, and propaganda. The "slop" of democracy is populists promising everything to everyone with no way to deliver it all. The "slop" of the marketplace of ideas is every idiot demanding their own un-informed rambling be given the same weight as the considered opinions of experts.
  None of these things contributed our social, technological, or economic advancement, they are simply things which happened at the same time.
  AI has stuff to contribute, but using it to make an endless feed of mediocrity is not it. The flood of low-effort GenAI stuff filling feeds and drowning signal with noise, as others have said: just give us your prompt.
  
  Reply View | 0 replies
- hoppp 2 days ago
  
  You can't build on slop because slop is a slippery slope
  
  Reply View | 2 replies
  
  Dilettante_ 2 days ago
  
  Maybe we'll have to slurp the slop so we don't slip on the slope.
  
  Reply View | 0 replies
  
  cindyllm 2 days ago
  
  [dead]
  
  Reply View | 0 replies
- walrusted 2 days ago
  
  the only structure you can build with slop is a burial mound
  
  Reply View | 1 reply
  
  Dilettante_ 2 days ago
  
  What is unhardened concrete but slop?
  
  Reply View | 0 replies
- Mistletoe 2 days ago
  
  How to make fire or kill a woolly mammoth was not slop come on.
  
  Reply View | 0 replies
- teiferer 2 days ago
  
  Because the pyramids, the theory of general relativity and the Linux kernel are all totally comparable to ChatGPT output. /s
  Why is anybody still surprised that the AI bubble made it that big?
  
  Reply View | 8 replies
  
  jrjfjgkrj 2 days ago
  
  for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today
  
  Reply View | 7 replies

tkgally 2 days ago

Somewhat related, the leaderboard of em-dash users on HN before ChatGPT:

https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

Reply View 35 replies

Ajakks a day ago

I have used a dash - like that for almost 20 years, 100% of the time I ought to use a semi-colon and about half of the time for commas - it let's me just keep talking about things, the comma is harder pause. I've recently started seriously writing at a literary level, and I have fallen in love with the em dash - it has a fantastic function within established professional writing, where it is used often - its why the AI uses it so much.

Reply View | 0 replies
maplethorpe 2 days ago

They should include users who used a double hyphen, too -- not everyone has easy access to em dashes.

Reply View | 27 replies
- bigiain 2 days ago
  
  That would false positive me. I have used double dashes to delimit quote attribution for decades.
  Like this:
  "You can't believe everything you read on the internet." -- Abraham Lincoln, personal correspondence, 1863
  
  Reply View | 1 reply
  
  dragonwriter 2 days ago
  
  That's literally a standard use of em-dash being approximated by a double hyphen, though.
  
  Reply View | 0 replies
- gblargg 2 days ago
  
  Does AI use double hyphens? I thought the point was to find who wasn't AI that used proper em dashes.
  
  Reply View | 20 replies
  
  jader201 2 days ago
  
  Anytime I do this — and I did it long before AI did — they are always em dashes, because iOS/macOS translates double dashes to em dashes.
  I think there may be a way to disable this, but I don’t care enough to bother.
  If people want to think my posts are AI generated, oh well.
  
  Reply View | 19 replies
- venturecruelty 2 days ago
  
  Oof, I feel like you'll accidentally capture a lot of getopt_long() fans. ;)
  
  Reply View | 1 reply
  
  Kinrany 2 days ago
  
  Excluding those with asymmetrical whitespace around might be enough
  
  Reply View | 0 replies
- SoftTalker 2 days ago
  
  Double-hyphen is an en-dash. Triple-hyphen is an em-dash.
  
  Reply View | 1 reply
  
  dragonwriter 2 days ago
  
  Double hyphen is replaced in some software with an en-dash (and in those, a triple hyphen is often replaced with an em-dash), and in some with an em-dash; its usually used (other than as input to one of those pieces of software) in places where an em-dash would be appropriate, but in contexts where both an em-dash set closed and an en-dash set open might be used, it is often set open.
  So, it’s not unambiguously s substitute for either is essentially its own punctuation mark used in ASCII-only environments with some influence from both the use of em-dashed and that of en-dashes in more formal environments.
  
  Reply View | 0 replies
a5c11 2 days ago

Apparently, it's not only em-dash that's distinctive. I've went through comments of the leader, and spot he also uses the backtick "’" instead of the apostrophe.

Reply View | 4 replies
- baiwl 2 days ago
  
  Just to be clear this is done automatically by macOS or iOS browsers when configured properly.
  
  Reply View | 1 reply
  
  a5c11 a day ago
  
  Never happened to me. And I'm using Mac and iPhone.
  
  Reply View | 0 replies
- kuschku 2 days ago
  
  I (~100 in the leaderboard, regardless of how you sort) also frequently use ’ (unicode apostrophe) instead of ' :D
  
  Reply View | 0 replies
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
lxgr 2 days ago

Amazing! But no love for en dashes?

Reply View | 0 replies

keiferski 2 days ago

Projects like this remind me of a plot point in the Cyberpunk 2077 game universe. The "first internet" got too infected with dangerous AIs, so much so that a massive firewall needed to be built, and a "new" internet was built that specifically kept out the harmful AIs.

(Or something like that: it's been awhile since I played the game, and I don't remember the specific details of the story.)

It makes me wonder if a new human-only internet will need to be made at some point. It's mostly sci-fi speculation at this point, and you'd really need to hash out the details, but I am thinking of something like a meatspace-first network that continually verifies your humanity in order for you to retain access. That doesn't solve the copy-paste problem, or a thousand other ones, but I'm just thinking out loud here.

Reply View 20 replies

jascha_eng 2 days ago

The problem really is that it is impossible to verify that the content someone uploads came from their mind and not a computer program. And at some point probably all content is at least influenced by AI. The real issue is also not that I used chatgpt to look up a synonym or asked a question before writing an article, the problem is when I copy paste the content and claim I wrote it.

Reply View | 9 replies
- tiborsaas a day ago
  
  The solution is not to be able to upload content. Extremely dumb services, basic trusted information sharing. Just like a newspaper.
  
  Reply View | 0 replies
- Ylpertnodi 2 days ago
  
  > The problem really is that it is impossible to verify that the content someone uploads came from their mind and not a computer program.
  Er...digital id.
  
  Reply View | 1 reply
  
  _heimdall 2 days ago
  
  Ignoring the privacy and security issues for a moment, how would having a digital ID prove that the blog post I put on my site came only out of my own mind and I didn't use an LLM for it?
  
  Reply View | 0 replies
- visarga 2 days ago
  
  > the problem is when I copy paste the content and claim I wrote it
  Why is this the problem and not the reverse - using AI without adding anything original into the soup? I could paraphrase an AI response in my own words and it will be no better. But even if I used AI, if it writes my ideas, then it would not be AI slop.
  
  Reply View | 0 replies
- fao_ 2 days ago
  
  > And at some point probably all content is at least influenced by AI.
  [citation needed]
  (I see absolutely no reason why that should be the case)
  
  Reply View | 1 reply
  
  asdff 2 days ago
  
  The issue is most things being derivative along with AI now representing an increasing share of "most things" from which to derive from.
  
  Reply View | 0 replies
- immibis 2 days ago
  
  There doesn't need to be any difference in treatment between AI slop and human slop. The point isn't to keep AI out - it's to keep spam and slop out. It doesn't matter whether it's produced by a being made of carbon or silicon.
  If someone can consistently produce high-quality content with AI assistance, so be it. Let them. Most don't, though.
  
  Reply View | 2 replies
  
  jascha_eng 2 days ago
  
  I think the main issue is that when content is hand written you can be certain someone put at least the effort it takes to write into it. And while some people write fast, I would assume that at least means they have read their own writing once.
  AIslop you can produce faster than you're able to read it. This makes it incredibly costly to filter out in comparison. It just messes so much with the signal to noise ratio on the web.
  
  Reply View | 1 reply
  
  immibis a day ago
  
  Bring AI-written is a proxy for being spam. Almost all AI-written content is spam and that's why it's bad.
  
  Reply View | 0 replies
SonnyTark 2 days ago

I share an opinion with Nick Bostrom, once a civilization disrupting idea (like LLMs) is pulled out of the bag, there is no putting it back. People in isolation will recreate it simply because it's now possible. All we can do is adapt.
That being said, the idea of a new freer internet is reality.. Mastodon is a great example. I think private havens like discord/matrix/telegram are an important step on the way.

Reply View | 4 replies
- ionwake 2 days ago
  
  how does one keep ai out of private havens? thorough verification? is that the future? private havens on platforms?
  
  Reply View | 3 replies
  
  embedding-shape 2 days ago
  
  In person web of trust in order to join any private community. It'll suck and be hard in the beginning, but once you reach a threshold, it'll be OK. Ban entire trees of users when you discover bots/puppets, to set an example.
  
  Reply View | 2 replies
tiborsaas a day ago

If you play it again, make sure not to miss the Blackwall gateway quickhack:
https://cyberpunk.fandom.com/wiki/Blackwall_Gateway
Absolutely brutal: https://www.youtube.com/watch?v=LD5z3GmQRXQ
---
I also noticed how simple the "new web" is when interacting with it. Of course, that's a game mechanic, but also kinda makes sense.

Reply View | 1 reply
- ndsipa_pomu a day ago
  
  It's a shame that the Militech Canto only has 4 quickhack slots
  
  Reply View | 0 replies
lukebuehler 2 days ago

Arguably this is already happening with much human-to-human interactions moving to private groups on Signal, WhatsApp, Telegram, etc.

Reply View | 0 replies
pavel_lishin 2 days ago

There were also similar plot points mentioned in Peter Watts' Starfish trilogy, and Neal Stephenson's Anathem.

Reply View | 0 replies
visarga 2 days ago

> a new human-only internet
Only if those humans don't take their leads from AI. If they read AI and write, not much benefit.

Reply View | 0 replies

permo-w 2 days ago

besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway. the same stuff that any half-awares person wouldn't have read in the past is now slightly better written, using more em dashes and instances of the word "delve". if you're consistently being caught out by this stuff then likely you need to improve your search hygiene, nothing so drastic as this

the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine

Reply View 14 replies

Cadwhisker 2 days ago

In the past, I'd find one wrong answer and I could easily spot the copies. Now there's a dozen different sites with the same wrong answer, just with better formatting and nicer text.

Reply View | 1 reply
- finaard 2 days ago
  
  The trick is to only search for topics where there are no answers, or only one answer leading to that blog post you wrote 10 years ago and forgot about.
  
  Reply View | 0 replies
never_inline 2 days ago

A colleague sent me a confident ChatGPT formatted bug report.
It misidentified what the actual bug was.
But the tone was so confident, and he replied to my later messages using chat gpt itself, which insisted I was wrong.
I don't like this future.

Reply View | 3 replies
- artursapek 2 days ago
  
  Did you call his ass out for being lazy and wasting your time?
  
  Reply View | 0 replies
- blitzar 2 days ago
  
  I have dozens of these over the years - many of the people responsible have "Head of ..." or "Chief ..." job titles now.
  
  Reply View | 0 replies
- crazygringo 2 days ago
  
  It's not the future. Tell him not to do that. If it happens again, bring it to the attention of his manager. Because that's not what he's being paid for. If he continues to do it, that's grounds for firing.
  What you're describing is not the future. It's a fireable offense.
  
  Reply View | 0 replies
Aurornis 2 days ago

> the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently
Some of the science, energy, and technology subreddits receive a lot of ChatGPT repost comment. There are a lot of people who think they’ve made a scientific or philosophical breakthrough with ChatGPT and need to share it with the world.
Even the /r/localllama subreddit gets constant AI spam from people who think they’ve vibecoded some new AI breakthrough. There have been some recent incidents where someone posted something convincing and then others wasted a lot of time until realizing the code didn’t accomplish what the post claimed it did.
Even on HN some of the “Show HN” posts are AI garbage from people trying to build portfolios. I wasted too much time trying to understand one of them until I realized they had (unknowingly?) duplicated some commits from upstream project and then let the LLM vibe code a README that sounded like an amazing breakthrough. It was actually good work, but it wasn’t theirs. It was just some vibecoding tool eventually arriving at the same code as upstream and then putting the classic LLM written, emoji-filled bullet points in the README

Reply View | 0 replies
zwnow 2 days ago

Yes it is a big deal. I cant find new artists without having a fear of their art being AI generated, same for books and music. I also cant post my stuff to the internet anymore because I know its going to be fed into LLM training data. The internet is dead to me mostly and thankfully I lost almost all interest of being on my computer as much as I used to be.

Reply View | 0 replies
darkwater 2 days ago

> besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway.
Yes, it is because of the other side of the coin. If you are writing human-generated, curated content, previously you would just do it in your small patch of Internet, and probably SEs (Google...) will pick it up anyway because it was good quality content. You just didn't care about SEO-driven shit anyway. Now you nicely hand-written content is going to be fed into LLM training and it's going to be used - whatever you want it or not - in the next generation of AI slop content.

Reply View | 2 replies
- visarga 2 days ago
  
  It's not slop if it is inspired from good content. Basically you need to add your original spices into the soup to make it not slop, or have the LLM do deep research kind of work to contrast among hundreds of sources.
  Slop did not originate from AI itself, but from the feed ranking Algorithm which sets the criteria for visibility. They "prompt" humans to write slop.
  AI slop is just an extension of this process, and it started long before LLMs. Platforms optimizing for their own interest at the expense of both users and creators is the source of slop.
  
  Reply View | 0 replies
- permo-w a day ago
  
  this is basically the equivalent of saying that content-farm writers might read your content and bastardise it into seo slop. okay, sure, it's true, but it was always true and AI doesn't change it significantly
  
  Reply View | 0 replies
pajamasam 2 days ago

SEO-spam was often at least somewhat factual and not complete generated garbage. Recipe sites, for example, usually have a button that lets you skip the SEO stuff and get to the actual recipe.
Also, the AI slop is covering almost every sentence or phrase you can think of to search. Before, if I used more niche search phrases and exact searches, I was pretty much guaranteed to get specific results. Now, I have to wade through pages and pages of nonsense.

Reply View | 0 replies
system2 2 days ago

Yes indeed, it is a problem. Now the old good sites have turned into AI-slop sites because they can't fight the spammers by writing slowly with humans.

Reply View | 1 reply
- permo-w a day ago
  
  if a potential defense is to simply the spammers, then the site was previously just as likely to start hiring content-farm human slop writers as they are now likely to use AI, i.e. the site probably wasn't that great in the first place and had equal potential to deteriorate, AI or no
  
  Reply View | 0 replies

themanmaran 2 days ago

The low-background steel of the internet

https://en.wikipedia.org/wiki/Low-background_steel

Reply View 3 replies

HelloUsername 2 days ago

As mentioned half a year ago at https://news.ycombinator.com/item?id=44239481

Reply View | 2 replies
- thm 2 days ago
  
  As mentioned 7 months ago https://news.ycombinator.com/item?id=43811732
  
  Reply View | 1 reply
  
  Ginger-Pickles 2 days ago
  
  As mentioned in this thread :P https://news.ycombinator.com/item?id=46103662
  
  Reply View | 0 replies

potato-peeler 2 days ago

You don’t need an extension to do this. Simply add a “before:” search filter to your search query, eg - https://www.google.com/search?q=Happiness+before%3A2022