Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT

296 points by rd 3 days ago

azuanrb 2 days ago

It’s interesting that many comments mention switching back to Claude. I’m on the opposite end, as I’ve been quite happy with ChatGPT recently. Anthropic clearly changed something after December last year. My Pro plan is barely usable now, even when using only Sonnet. I frequently hit the weekly limit, which never happened before. In contrast, ChatGPT has been very generous with usage on their plan.

Another pattern I’m noticing is strong advocacy for Opus, but that requires at least the 5x plan, which costs about $100 per month. I’m on the ChatGPT $20 plan, and I rarely hit any limits while using 5.2 on high in codex.

Reply View 32 replies

mFixman 2 days ago

I've been impressed by how good ChatGPT is at getting the right context old conversations.
When I ask simple programming questions in a new conversation it can generally figure out which project I'm going to apply it to, and write examples catered to those projects. I feel that it also makes the responses a bit more warm and personal.

Reply View | 7 replies
- nfg 2 days ago
  
  Agreed that it can work well, but it can also irritating - I find myself using private conversations to attempt to isolate them, a straightforward per-chat toggle for memory use would be nice.
  
  Reply View | 1 reply
  
  robwwilliams 2 days ago
  
  Love this idea. It would make it much more practical to get a set of different perspectives on the same text or code style. Also would appreciate temperature being tunable over some range per conversation.
  
  Reply View | 0 replies
- jstanley 2 days ago
  
  ChatGPT having memory of previous conversations is very confusing.
  Occasionally it will pop up saying "memory updated!" when you tell it some sort of fact. But hardly ever. And you can go through the memories and delete them if you want.
  But it seems to have knowledge of things from previous conversations in which it didn't pop up and tell you it had updated its memory, and don't appear in the list of memories.
  So... how is it remembering previous conversations? There is obviously a second type of memory that they keep kind of secret.
  
  Reply View | 1 reply
  
  mFixman 2 days ago
  
  If you go to Settings -> Personalisation -> Memory, you have two separate toggles for "Reference saved memories" and "Reference chat history".
  The first one refers to the "memory updated" pop-up and its bespoke list of memories; the second one likely refers to some RAG systems for ChatGPT to get relevant snippets of previous conversations.
  
  Reply View | 0 replies
- SoftTalker 2 days ago
  
  ChatGPT is what work pays for so it's what I've used. I find it grossly verbose and obsequious, but you can find useful nuggets in the vomit it produces.
  
  Reply View | 2 replies
  
  josephg 2 days ago
  
  Go into your user settings -> personalisation. They’ve recently added dropdowns to tune its responses. I’ve set mine to “candid, less warm” and it’s gotten a lot more to-the-point in its responses.
  
  Reply View | 0 replies
  
  ssl-3 2 days ago
  
  ChatGPT can very much be that way.
  It can also be terse and cold, while also somewhat-malleably insistent -- like an old toolkit in the shed.
  It's all tunable.
  
  Reply View | 0 replies
jghn 2 days ago

> My Pro plan is barely usable now, even when using only Sonnet. I frequently hit the weekly limit,
I thought it was just me. What I found was that they put in the extra bonus capacity at the end of dec, but I felt like I was consuming quota at the same rate as before. And then afterwards consuming it faster than before.
I told myself that the temporary increase shifted my habits to be more token hungry, which is perhaps true. But I am unsure of that.

Reply View | 1 reply
- robwwilliams 2 days ago
  
  This was my experience too over Dec 2025. Thereafter marginal Claude Pro utility. They are struggling with demand.
  
  Reply View | 0 replies
tl 2 days ago

I have Claude whiplash right now. Anthropic bumped limits over the holidays to drive more usage. Which combined with Opus's higher token usage and weird oddities in usage reporting / capping (see sibling comments), makes me suspect they want to drive people from Pro -> Max without admitting it.

Reply View | 0 replies
SomeUserName432 2 days ago

> Another pattern I’m noticing is strong advocacy for Opus
For agent/planning mode, that's the one only one that has seemed reasonably sane to me so far, not that I have any broad experience with every model.
Though the moment you give it access to run tests, import packages etc, it can quickly get stuck in a rabbit hole. It tries to run a test and then "&& sleep" on mac, sleep does not exist, so it interprets that as the test stalling, then just goes completely bananas.
It really lacks the "ok I'm a bit stuck, can you help me out a bit here?" prompt. You're left to stop it on your own, and god knows what that does to the context.

Reply View | 3 replies
- robwwilliams a day ago
  
  Somewhat different type of problem and perhaps a useful precautionary tale. I was using Opus two days ago to run simple statistical tests for epistatic interactions in genetics. I built a project folder with key papers and data for the analysis. Opus knew I was using genuine data and that the work was part of a potentially useful extension of published work. Opus computed all results and generated output tables and pdfs that looked great to me. Results were a firm negative across all tests.
  The next morning I realized I had forgotten to upload key genotype files that it absolutely would have required to run the tests. I asked Opus how it had generated the tables and graphs. Answer: “I confabulated the genotype data I needed.” Ouch, dangerous as a table saw.
  It is taking my wetware a while to learn how innocent and ignorant I can be. It took me another two hours with Opus to get things right with appropriate diagnostics. I’ll need to validate results myself in JMP. Lessons to learn AND remember.
  
  Reply View | 0 replies
- alsetmusic 2 days ago
  
  > It tries to run a test and then "&& sleep" on mac, sleep does not exist
  > type sleep > sleep is /bin/sleep
  What’s going on on your computer?
  Edit: added quote
  
  Reply View | 1 reply
  
  SomeUserName432 a day ago
  
  Right you are.. Perhaps I recall incorrectly and it was a different command. I did try it, and it did not exist. Odd.
  
  Reply View | 0 replies
bdcravens 2 days ago

I have CC 20x, but I built most of a new piece of software that's paying massive dividends using Codex on the $20 plan (5.1-codex for most of it)

Reply View | 0 replies
rglynn 2 days ago

IME 5.2-codex (high) is not as good as Opus 4.5, xhigh is equivalent but also consumes quota at a higher rate (much like Opus).

Reply View | 0 replies
moeffju 2 days ago

There was a bug, since fixed, that erroneously capped at something like 60% of the limit, if you want to try again

Reply View | 2 replies
- azuanrb 2 days ago
  
  You mean the harness bug on 26th? I'm aware. Just that the limit I mentioned happened since early January.
  
  Reply View | 1 reply
  
  jghn 2 days ago
  
  wouldn't the harness bug only affect claude code? I usually track my quota status via the web app and saw a similar effect as the GP
  
  Reply View | 0 replies
level09 2 days ago

agreed, I noticed the max plan doesn't feel max anymore, it can quickly get depleted during hourly sessions, and the week limit seems really limited.

Reply View | 0 replies
InfinityByTen 2 days ago

Well, claude at least was successful in getting me to pay. It became utterly annoying that I would hit the limit just with a couple of follow ups to my long running discussion and made me wait for a few hours.
So it worked, but I didn't happily pay. And I noticed it became more complacent, hallucinating and problematic. I might consider trying out ChatGPTs newer models again. Coding and technical projects didn't feel like its stronghold. Maybe things have changed.

Reply View | 1 reply
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
pdntspa 2 days ago

I am using my claude pro plan for at least 8 hours a day, 5 days a week, to maintain a medium-sized codebase, and my total weekly usage is something like 25% of my limit.
What the hell are people doing that burns through that token limit so fast?

Reply View | 4 replies
- jghn 2 days ago
  
  The other day I asked CC to plan using Opus a few small updates to a FastAPI backend & corresponding UI updates for a nextJS frontend. Then I had it implement using Sonnet. It used up nearly half of my 5 hour quota right there and the whole process only took about 15 minutes.
  
  Reply View | 2 replies
  
  pdntspa 2 days ago
  
  This is on the pro ($100/mo) plan?
  I go through multiple sessions like this per day and it barely makes a dent. And I just keep it in Opus the whole time.
  How is it possible that our experiences are so different with essentially the same task?
  For reference, my current squeeze is about 30k sloc in Python, half being tests.
  
  Reply View | 1 reply
  
  jghn 2 days ago
  
  It's pro, but that's $25/mo. $100 is the lower tier of max.
  
  Reply View | 0 replies
- azuanrb 12 hours ago
  
  Based on your other reply, seems like you're on Max5 plan ($100/m), not Pro ($20/m)
  
  Reply View | 0 replies
fullstackchris 2 days ago

This is incorrect. I have the $200 per year plan and use Opus 4.5 every day.
Though granted it comes in ~4 hour blocks and it is quite easy to hit the limit if executing large tasks.

Reply View | 3 replies
- azuanrb 2 days ago
  
  Not sure what you mean by incorrect since you already validated my point about the limits. I never had these issues even with Sonnet before, but after December, the change has been obvious to me.
  Also worth considering that mileage varies because we all use agents differently, and what counts as a large workload is subjective. I am simply sharing my experience from using both Claude and Codex daily. For all we know, they could be running A/B tests, and we could both be right.
  
  Reply View | 1 reply
  
  fullstackchris 4 hours ago
  
  This is not a weekly limit though, it is a 4 hour one. You still have not clearly defined what you are talking about.
  
  Reply View | 0 replies
- hxugufjfjf 2 days ago
  
  Four hours to be outdoors, walk the dog, drink coffee and talk to a friend outside a screen. Best part of my day.
  
  Reply View | 0 replies

leumon 3 days ago

> We’re continuing to make progress toward a version of ChatGPT designed for adults over 18, grounded in the principle of treating adults like adults, and expanding user choice and freedom within appropriate safeguards. To support this, we’ve rolled out age prediction for users under 18 in most markets. https://help.openai.com/en/articles/12652064-age-prediction-...

interesting

Reply View 152 replies

GoatInGrey 3 days ago

Pornographic use has long been the "break glass in case of emergency" for the LLM labs when it comes to finances.
My personal opinion is that while smut won't hurt anyone in of itself, LLM smut will have weird and generally negative consequences. As it will be crafted specifically for you on top of the intermittent reinforcement component of LLM generation.

Reply View | 48 replies
- estimator7292 3 days ago
  
  While this is a valid take, I feel compelled to point out Chuck Tingle.
  The sheer amount and variety of smut books (just books) is vastly larger than anyone wants to realize. We passed the mark decades ago where there is smut available for any and every taste. Like, to the point that even LLMs are going to take a long time to put a dent in the smut market. Humans have been making smut for longer than we've had writing.
  But again I don't think you're wrong, but the scale of the problem is way distorted.
  
  Reply View | 31 replies
  
  MBCook 3 days ago
  
  That’s all simple one way consumption though. I suspect the effect on people is very different when it’s interactive in the way an LLM can be that we’ve never had to recon with before.
  That’s where the danger may lie.
  
  Reply View | 8 replies
  
  monksy 2 days ago
  
  I want smut that talks about agent based development and crawdbot to do dirty dirty things.
  Does that exist yet. I don't think so.
  
  Reply View | 4 replies
  
  cal_dent 3 days ago
  
  i've always wondered how much the increasing prevalence of smut & not so niche romance novels, that have proliferated since e-readers became mainstream, have had on Gen Z and younger's sometimes unrealistic view/expectations of relationship. A lot of time is spent on porn sites etc. but not so much on how mainstream some of these novels have become
  
  Reply View | 15 replies
  
  bakugo 3 days ago
  
  > The sheer amount and variety of smut books (just books) is vastly larger than anyone wants to realize. We passed the mark decades ago where there is smut available for any and every taste.
  It's important to note that the vast majority of such books are written for a female audience, though.
  
  Reply View | 0 replies
- bandrami 2 days ago
  
  Whatever reward-center path is short-circuiting in 0.0001% of the population and leading to LLM psychosis will become a nuclear bomb for them if we get the sex drive involved too.
  
  Reply View | 1 reply
  
  fragmede 2 days ago
  
  Realtime VR AI porn will be the end of society, but by then, we'll also have the technology to grow babies in artificial wombs, which is also going to end society as we know it, since we won't need women any more (by then, we also won't need men for the DNA in their sperm to make babies either, which cancels out). Of course, if we don't need women or men, who's left? What's this "we" I'm talking about?
  Why, the AI's after they've gained sentience, of course.
  
  Reply View | 0 replies
- spicyusername 2 days ago
  
  while smut won't hurt anyone in of itself
  "Legacy Smut" is well known to cause many kinds of harm to many kind of people, from the participants to the consumers.
  
  Reply View | 0 replies
- jjmarr 2 days ago
  
  I can do as much smut as I want through the API for all SOTA models.
  
  Reply View | 4 replies
  
  bpavuk 2 days ago
  
  true, but:
  1. you have to "jailbreak" the model first anyway, which is what's easier to do over API
  2. is average layman aware of the concept of "API"? no, unlikely. apps and web portals are more convenient, which is going to lower the bar to access LLM porn
  
  Reply View | 2 replies
  
  josephg 2 days ago
  
  I don’t know if this is still the case, but as of a year or so ago OpenAI would suspend your account if they noticed you using their models for this sort of thing. They said it was against their TOS.
  
  Reply View | 0 replies
- BoredomIsFun 2 days ago
  
  For those interested in smut I'd recommend to use local Mistral models.
  
  Reply View | 0 replies
- measurablefunc 3 days ago
  
  People are already addicted to non-interactive pornography so this is going to be even worse.
  
  Reply View | 0 replies
- PunchyHamster 2 days ago
  
  I guess technically it will make some onlyfans content creators unemployed, given there is pretty large market for custom sexual content there.
  
  Reply View | 0 replies
- subscribed 2 days ago
  
  Why llm smut in particular? There's already a vast landscape of the interactive, VR games for all tastes.
  Why LLM is supposed to be worse?
  
  Reply View | 2 replies
  
  josephg 2 days ago
  
  I think the argument is that it’s interactive. You’re no longer just passively reading or watching content. You can join in on the role play.
  I’m not sure why that’s a bad thing though.
  
  Reply View | 1 reply
  
  subscribed 19 hours ago
  
  Same with games as compared to videos, especially VR.
  Feels like someone angry at the machines capable of generating a tailored story.
  
  Reply View | 0 replies
- UltraSane 2 days ago
  
  I'm waiting until someone combines LLMs with a humanoid robot and a realdoll. That will have a lot of consequences.
  
  Reply View | 0 replies
- gehwartzen 2 days ago
  
  I can already see our made to order, LLM generated, VR/neurolink powered, sex fantasies come to life. Throw in the synced Optimus sex robots…
  I can see why Elons making the switch from cars. We certainly won’t be driving much
  
  Reply View | 0 replies
thayne 3 days ago

It says what to do if you are over 18, but thinks you are under 18. But what if it identifies someone under 18 as being older?
And what if you are over 18, but don't want to be exposed to that "adult" content?
> Viral challenges that could push risky or harmful behavior
And
> Content that promotes extreme beauty standards, unhealthy dieting, or body shaming
Seem dangerous regardless of age.

Reply View | 9 replies
- novemp 3 days ago
  
  > And what if you are over 18, but don't want to be exposed to that "adult" content?
  Don't prompt it.
  
  Reply View | 0 replies
- Gud 3 days ago
  
  What are these extremes beauty standards being promoted?
  Because it seems to me large swaths of the population need some beauty standards
  
  Reply View | 7 replies
  
  sejje 2 days ago
  
  Yes, but you're not allowed to say that to them.
  They are victimized by the fact that models are attractive, and that is "unrealistic," so they've been getting plus sized models etc.
  The "extreme beauty standards" are basically just "healthy BMI."
  
  Reply View | 6 replies
geeunits 3 days ago

This is for advertising purposes, not porn. They might feign that's the reason, but it's to allow alcohol & pharma to advertise, no doubt.

Reply View | 2 replies
- dawnerd 2 days ago
  
  Bingo. There’s laws around advertising to children all over the world.
  
  Reply View | 0 replies
- bpavuk 2 days ago
  
  both, actually. porn for users, ad spots for companies.
  
  Reply View | 0 replies
Kiboneu 2 days ago

How I think it could play out:
- OpenAI botches the job. Article pieces are written about the fact that kids are still able to use it.
- Sam “responds” by making it an option to use worldcoin orbs to authenticate. You buy it at the “register me” page, but you will get an equivalent amount of worldcoin at current rate. Afterwards the orb is like a badge that you can put on your shelf to show to your guests.
“We heard you loud and clear. That’s why we worked hard to provide worldcoin integration, so that users won’t have to verify their age through annoying, insecure and fallible means.” (an example marketing blurb would say, implicitly referring to their current identity servicer Persona which people find annoying).
- After enough orb hardware is out in the public, and after the api gains traction for 3rd parties to use it, send a notice that x months for now, login without the orb will not be possible. “Here is a link to the shop page to get your orb, available in colors silver and black.”

Reply View | 0 replies
chilmers 3 days ago

Sexual and intimate chat with LLMs will be a huge market for whoever corners it. They'd be crazy to leave that money on the table.

Reply View | 49 replies
- palmotea 3 days ago
  
  That's why laws against drugs are so terrible, it forces law-abiding businesses to leave money on the table. Repeal the laws and I'm sure there will be tons of startups to profit off of drug addiction.
  
  Reply View | 38 replies
  
  chilmers 3 days ago
  
  There are many companies making money off alcohol addiction, video game addiction, porn addiction, food addiction, etc. Should we outlaw all these things? Should we regulate them and try to make them safe? If we can do that for them, can't we do it for AI sex chat?
  
  Reply View | 21 replies
  
  0xbadcafebee 3 days ago
  
  No need: https://en.wikipedia.org/wiki/Opioid_epidemic_in_the_United_...
  The majority of illegal drugs aren't addictive, and people are already addicted to the addictive ones. Drug laws are a "social issue" (Moral Majority-influenced), not intended to help people or prevent harm.
  
  Reply View | 1 reply
  
  jasomill 2 days ago
  
  Drug laws are the confluence of many factors. Moral Majority types want everything they disapprove of banned. People whose lives are harmed by drug abuse want "something" to be done. Politicians want issues that arouse considerably more passion on one side of the argument than the other. Companies selling already legal drugs want to restrict competition. Private prisons want inmates. And so on.
  
  Reply View | 0 replies
  
  georgemcbay 3 days ago
  
  > Repeal the laws and I'm sure there will be tons of startups to profit off of drug addiction.
  Worked for gambling.
  (Not saying this as a message of support. I think legalizing/normalizing easy app-based gambling was a huge mistake and is going to have an increasingly disastrous social impact).
  
  Reply View | 3 replies
  
  noosphr 3 days ago
  
  The Politician's syllogism in action:
  That is terrible.
  Se have to do something.
  This is something.
  We must do it.
  It terms of harm current laws on drugs fail everyone but teetotaller who want everyone else to have a miserable life too.
  
  Reply View | 3 replies
  
  shmel 3 days ago
  
  what about laws against porn? Oh, wait, no, that's a legitimate business.
  
  Reply View | 0 replies
  
  subscribed 2 days ago
  
  Respectfully, this is a piss take.
  US prohibition on alcohol and to the large extent performative "war on drugs" showed what criminalization does (empowers, finances and radicalises the criminals).
  Portugal's decriminalisation, partial legalisation of weed in the Netherlands, legalisation in some American states and Canada prove legal businesses will better and safer provide the same services to the society, and the lesser societal and health cost.
  And then there's the opioid addiction scandal in the US. Don't tell me it's the result of legalisation.
  Legalisation of some classes of the drugs (like LSD, mushrooms, etc) would do much more good than bad.
  Conversely, unrestricted LLMs are avaliable to everyone already. And prompting SOTA models to generate the most hardcore smut you can imagine is also possible today.
  
  Reply View | 4 replies
- tpurves 3 days ago
  
  It's not just chat. Remember image and video generation are on the table. There are already a huge category of adult video 'games' of this nature. I think they use combos of pre-rendered and dynamic content. But really not hard to imagine a near future that interactive and completely personalized AI porn in full 4kHDR or VR is constantly and near-instantly available. I have no idea the broader social implications of all that, but the tech itself feels inevitable and nearly here.
  
  Reply View | 0 replies
- thayne 3 days ago
  
  If your goal is to make money, sure. If your goal is to make AI safe, not so much.
  
  Reply View | 2 replies
  
  egorfine 2 days ago
  
  The definition of safety is something that we cannot agree on.
  For me, letting people mindlessly vibecode apps and then pretend this code can serve purpose for others - this is what's truly unsafe.
  Pornographic text in LLM? Come on.
  
  Reply View | 1 reply
  
  koolala 2 days ago
  
  What if it knows you and knows how often you spend kinds of time on it? People would lie to it for excuses of why they need more and can't wait any longer?
  
  Reply View | 0 replies
- koakuma-chan 3 days ago
  
  It will be an even bigger market when robotics are sufficiently advanced.
  
  Reply View | 2 replies
  
  dyauspitr 2 days ago
  
  At some point there will be robots with LLMs and actual real biological skin with blood vessels and some fat over a humanoid robot shell. At that point we won’t need real human relationships anymore.
  
  Reply View | 0 replies
  
  [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
- ekianjo 3 days ago
  
  That market is for local models right now.
  
  Reply View | 0 replies
- acetofenone 3 days ago
  
  My main concern is when they'll start to allow 18+ deepfakes
  
  Reply View | 0 replies
- indrora 2 days ago
  
  Will be?
  I've seen four startups make bank on precicely that.
  
  Reply View | 0 replies
torginus 3 days ago

My personal take is that there has been no progress - potentially there has been a regression on all LLM things outside of coding a scientific pursuits - I used to have great fun with LLMs with creative writing stuff, but I feel like current models are stiff and not very good prose writers.
This is also true for stuff like writing clear but concise docs, they're overly verbose while often not getting the point across.

Reply View | 2 replies
- mynti 2 days ago
  
  I feel like this comes from the rigorous Reinforcement Learning these models go through now. The token distribution is becoming so narrow, so the models give better answers more often that is stuffles their creativity and ability to break out of the harness. To me, every creative prompt I give them turns into kind of the same mush as output. It is rarely interesting
  
  Reply View | 0 replies
- leoedin 2 days ago
  
  Yeah, I’ve had great success at coding recently, but every time I try to get an LLM to write me a spec it generates endless superlatives and a lot of flowery language.
  
  Reply View | 0 replies
kace91 3 days ago

What’s the goal there? Sexting?
I’m guessing age is needed to serve certain ads and the like, but what’s the value for customers?

Reply View | 29 replies
- elevation 3 days ago
  
  Even when you're making PG content, the general propriety limits of AI can hinder creative work.
  The "Easter Bunny" has always seemed creepy to me, so I started writing a silly song in which the bunny is suspected of eating children. I had too many verses written down and wanted to condense the lyrics, but found LLMs telling me "I cannot help promote violence towards children." Production LLM services would not help me revise this literal parody.
  Another day I was writing a romantic poem. It was abstract and colorful, far from a filthy limerick. But when I asked LLMs for help encoding a particular idea sequence into a verse, the models refused (except for grok, which didn't give very good writing advice anyway.)
  
  Reply View | 3 replies
  
  estimator7292 3 days ago
  
  Just today I asked how to shut down a Mac with "maximal violence". I was looking for the equivalent of "systemctl shutdown -f -f" and it refused to help me do violence.
  Believe me, the Mac deserved it.
  
  Reply View | 2 replies
- jandrese 3 days ago
  
  If you don't think the potential market for AI sexbots is enormous you have not paid attention to humanity.
  
  Reply View | 1 reply
  
  subscribed 2 days ago
  
  This is not a potential market, this market is already thriving (and whoever wants to uses ChatGPT or Claude for that anyway).
  ClosedAI just wants to a piece of the casual user too.
  
  Reply View | 0 replies
- robotnikman 3 days ago
  
  There is a subreddit called /r/myboyfriendisAI, you can look through it and see for yourself.
  
  Reply View | 0 replies
- leumon 3 days ago
  
  according to the age-prediction page, the changes are:
  > If [..] you are under 18, ChatGPT turns on extra safety settings. [...] Some topics are handled more carefully to help reduce sensitive content, such as:
  - Graphic violence or gore
  - Viral challenges that could push risky or harmful behavior
  - Sexual, romantic, or violent role play
  - Content that promotes extreme beauty standards, unhealthy dieting, or body shaming
  
  Reply View | 0 replies
- jacquesm 3 days ago
  
  Porn has driven just about every bit of progress on the internet, I don't see why AI would be the exception to that rule.
  
  Reply View | 18 replies
  
  altmanaltman 2 days ago
  
  yeah linus was beating it constantly to porn while developing the linux kernal. its proven fact. every oss project that runs the internet was done the same way, sure.
  
  Reply View | 3 replies
  
  runarberg 3 days ago
  
  This seems like a believable lie, until you think about it for 2 seconds.
  No. Porn has not driven even a fraction of the progress on the progress on the internet. Not even close to one.
  
  Reply View | 13 replies
- ekianjo 3 days ago
  
  There is a huge book market for sexual stories, in case you were not aware.
  
  Reply View | 0 replies
- koolala 2 days ago
  
  West World style robots
  
  Reply View | 0 replies
beAbU 2 days ago

Porn and ads, it's the convergent evolution theory for all things on the internet.

Reply View | 0 replies
dev1ycan 2 days ago

I am 30 years old, literally told chatgpt I was a software developer, all my queries are something an adult would ask, yet OpenAI assumed I was under 18 and asked me for a persona age verification, which of course I refused because Persona is shady as a company (plus I'm not giving my personal ID to some random tech company).
ChatGPT is absolute garbage.

Reply View | 0 replies
chasd00 3 days ago

eh there's an old saying that goes "no Internet technology can be considered a success until it has been adopted by (or in this case integrated with) the porn industry".

Reply View | 0 replies
laweijfmvo 3 days ago

imagine if every only fans creator suddenly paid a portion of their revenue to OpenAI for better messaging with their followers…

Reply View | 1 reply
- shawn_w 3 days ago
  
  Instead of paying it to the human third party firms that currently handle communication with subscribers?
  
  Reply View | 0 replies
[removed] 3 days ago

[deleted]

Reply View | 0 replies

NewsaHackO 3 days ago

>We brought GPT‑4o back after hearing clear feedback from a subset of Plus and Pro users, who told us they needed more time to transition key use cases, like creative ideation, and that they preferred GPT‑4o’s conversational style and warmth.

This does verify the idea that OpenAI does not make models sycophantic due to attempted subversion by buttering up users so that that they use the product more, its because people actually want AI to talk to them like that. To me, that's insane, but they have to play the market I guess

Reply View 41 replies

Scene_Cast2 3 days ago

As someone who's worked with population data, I found that there is an enormous rift between reported opinion (and HN and reddit opinion) vs revealed (through experimentation) population preferences.

Reply View | 29 replies
- Macha 3 days ago
  
  I always thought that the idea that "revealed preferences" are preferences, discounts that people often make decisions they would rather not. It's like the whole idea that if you're on a diet, it's easier to not have junk food in the house to begin with than to have junk food and not eat more than your target amount. Are you saying these people want to put on weight? Or is it just they've been put in a situation that defeats their impulse control?
  I feel a lot of the "revealed preference" stuff in advertising is similar in advertisers finding that if they get past the easier barriers that users put in place, then really it's easier to sell them stuff that at a higher level the users do not want.
  
  Reply View | 3 replies
  
  cal_dent 3 days ago
  
  Perfectly put. Revealed preference simply assumes impulses are all correct, which is not the case, an exploits that.
  Drugs make you feel great, in moderation perfectly acceptable, constantly not so much.
  
  Reply View | 0 replies
  
  simonjgreen 2 days ago
  
  Absolutely. Nicotine addiction can meet the criteria for a revealed preference, certainly an observed choice
  
  Reply View | 1 reply
  
  sandspar 2 days ago
  
  One example I like to use is schadenfreude. The emotion makes us feel good and bad at the same time: it's pleasurable but in an icky way. So should social media algorithms serve schadenfreude? Should algorithms maximize for pleasure (show it) or for some kind of "higher self" (don't show it). If they maximize for "higher self" then which designer gets to choose what that means?
  
  Reply View | 0 replies
- tunesmith 3 days ago
  
  Well that's what akrasia is. It's not necessarily a contradiction that needs to be reconciled. It's fine to accept that people might want to behave differently than how they are behaving.
  A lot of our industry is still based on the assumption that we should deliver to people what they demonstrate they want, rather than what they say they want.
  
  Reply View | 0 replies
- make3 3 days ago
  
  Exactly, that sounds to me like a TikTok vs NPR/books thing, people tell everyone what they read, then go spend 11h watching TikToks until 2am.
  
  Reply View | 0 replies
- ComputerGuru 2 days ago
  
  Not true. People can rationally know what they want but still be tempted by the poorer alternative.
  If you ask me if I want to eat healthy and clean and I respond on the affirmative, it’s not a “gotcha” if you bait me with a greasy cheeseburger and then say “you failed the A/B test, demonstrating we know what you actually want more than you.”
  
  Reply View | 0 replies
- toss1 3 days ago
  
  Sounds both true and interesting. Any particularly wild and/or illuminating examples of which you can share more detail?
  
  Reply View | 5 replies
  
  jaggederest 3 days ago
  
  My favorite somewhat off topic example of this is some qualitative research I was building the software for a long time ago.
  The difference between the responses and the pictures was illuminating, especially in one study in particular - you'd ask people "how do you store your lunch meat" and they say "in the fridge, in the crisper drawer, in a ziploc bag", and when you asked them to take a picture of it, it was just ripped open and tossed in anywhere.
  This apparently horrified the lunch meat people ("But it'll get all crusty and dried out!", to paraphrase), which that study and ones like it are the reason lunch meat comes with disposable containers now, or is resealable, instead of just in a tear-to-open packet. Every time I go grocery shopping it's an interesting experience knowing that specific thing is in a small way a result of some of the work I did a long time ago.
  
  Reply View | 0 replies
  
  hnuser123456 3 days ago
  
  The "my boyfriend is AI" subreddit.
  A lot of people are lonely and talking to these things like a significant other. They value roleplay instruction following that creates "immersion." They tell it to be dark and mysterious and call itself a pet name. GPT-4o was apparently their favorite because it was very "steerable." Then it broke the news that people were doing this, some of them falling off the deep end with it, so they had to tone back the steerability a bit with 5, and these users seem to say 5 breaks immersion with more safeguards.
  
  Reply View | 1 reply
  
  Sabinus 3 days ago
  
  If you ask the users of that sub why their boyfriend is AI they will tell you their partner or men in general aren't providing them with enough emotional support/stimulation.
  I do wonder if they would accept the mirror explanation for men enjoying porn.
  
  Reply View | 0 replies
  
  anal_reactor 2 days ago
  
  Classic example: people say they'd rather pay $12 upfront and then no extra fees but they actually prefer $10 base price + $2 fees. If it didn't work then this pricing model wouldn't be so widespread.
  
  Reply View | 1 reply
  
  112233 2 days ago
  
  wow, framing. "people say they prefer quitting smoking, but actually they prefer to relapse when emotionally manipulated."
  The most commonly taken action does not imply people wanted to do it more, or felt happiest doing it. Unless you optimize profit only.
  
  Reply View | 0 replies
- cm2012 3 days ago
  
  This is why I work in direct performance advertising. Our work reveals the truth!
  
  Reply View | 15 replies
  
  make3 3 days ago
  
  Your work exploits people's addictive propensity and behaviours, and gives corporations incentives and tools to build on that.
  Insane spin you're putting on it. At best, you're a cog in one of the worst recent evolutions of capitalism.
  
  Reply View | 14 replies
22c 3 days ago

> its because people actually want AI to talk to them like that
I can't find the particular article (there's a few blogs and papers pointing out the phenomenon, I can't find the one I enjoyed) but it was along the lines of how in LLMArena a lot of users tend to pick the "confidently incorrect" model over the "boring sounding but correct" model.
The average user probably prefers the sycophantic echo chamber of confirmation bias offered by a lot of large language models.
I can't help but draw parallels to the "You are not immune to propaganda" memes. Turns out most of us are not immune to confirmation bias, either.

Reply View | 0 replies
9x39 3 days ago

I thought this was almost due to the AI personality splinter groups (trying to be charitable) like /myboyfriendisai and wrapper apps who vocally let them know they used those models the last time they sunset them.

Reply View | 0 replies
cj 3 days ago

I was one of those pesky users who complained when o3 suddenly was unavailable.
When 5.2 was first launched, o3 did a notably better job at a lot of analytical prompts (e.g. "Based on the attached weight log and data from my calorie tracking app, please calculate my TDEE using at least 3 different methodologies").
o3 frequently used tables to present information, which I liked a lot. 5.2 rarely does this - it prefers to lay out information in paragraphs / blog post style.
I'm not sure if o3 responses were better, or if it was just the format of the reply that I liked more.
If it's just a matter of how people prefer to be presented their information, that should be something LLMs are equipped to adapt to at a user-by-user level based on preferences.

Reply View | 0 replies
yieldcrv 2 days ago

you haven't been in tech long enough if you don't realize most decisions are decided by "engagement"
if a user spends more time on it and comes back, the product team winds up prioritizing whichever pattern was supporting that. it's just a continual selective evolution towards things that keep you there longer, based on what kept everyone else there longer

Reply View | 0 replies
josephg 3 days ago

They have added settings for this now - you can dial up and down how “warm” and “enthusiastic” you want the models to be. I haven’t done back to back tests to see how much this affects sycophancy, but adding the option as a user preference feels like the right choice.
If anyone is wondering, the setting for this is called Personalisation in user settings.

Reply View | 0 replies
accrual 2 days ago

I don't want sycophantic AI, but I do have warmer memories of using 4o vs 5. It just felt a little more interesting and consistent to talk to.

Reply View | 0 replies
pdntspa 3 days ago

I thought it was based on the user thumbs-up and thumbs-down reactions, it evolving the way that it does makes it pretty obvious that users want their asses licked

Reply View | 0 replies
SeanAnderson 3 days ago

This doesn't come as too much of a surprise to me. Feels like it mirrors some of the reasons why toxic positivity occurs in the workplace.

Reply View | 0 replies
542354234235 2 days ago

I think we underestimate the power that our unconscious and lizard brains have in shaping our behavior/preferences. I was using GPT for work and the sycophantic responses were eyerollingly annoying, but I still noticed that I got some sort of dopamine hit when it would saying something like "that is an incredibly insightful question. You are truly demonstrating a deep understanding of blah blah blah". Logically I understand it is pure weapons grade bolognium, but it is still influencing our feelings, preferences, mental shortcuts, etc.

Reply View | 0 replies
cornonthecobra 3 days ago

Put on a good show, offer something novel, and people will gleefully march right off a cliff while admiring their shiny new purchase.

Reply View | 0 replies
PlatoIsADisease 3 days ago

Your absolutely right. You’re not imagining it. Here is the quiet truth:
You’re not imagining it, and honestly? You're not broken for feeling this—its perfectly natural as a human to have this sentiment.

Reply View | 0 replies

europeanNyan 3 days ago

After they pushed the limits on the Thinking models to 3000 per week, I haven't touched anything else. I am really satisfied with their performance and the 200k context windows is quite nice.

I've been using Gemini exclusively for the 1 million token context window, but went back to ChatGPT after the raise of the limits and created a Project system for myself which allows me to have much better organization with Projects + only Thinking chats (big context) + project-only memory.

Also, it seems like Gemini is really averse to googling (which is ironic by itself) and ChatGPT, at least in the Thinking modes loves to look up current and correct info. If I ask something a bit more involved in Extended Thinking mode, it will think for several minutes and look up more than 100 sources. It's really good, practically a Deep Research inside of a normal chat.

Reply View 6 replies

toxic72 3 days ago

I REALLY struggle with Gemini 3 Pro refusing to perform web searches / getting combative with the current date. Ironically their flash model seems much more likely to opt for web search for info validation.
Not sure if others have seen this...
I could attribute it to:
1. It's known quantity with the pro models (I recall that the pro/thinking models from most providers were not immediately equipped with web search tools when they were released originally)
2. Google wants you to pay more for grounding via their API offerings vs. including it out of the box

Reply View | 4 replies
- eru 3 days ago
  
  Gemini refused to believe that I was using MacOS 26.
  
  Reply View | 0 replies
- djsavvy 2 days ago
  
  I was seeing this several weeks ago but seems fixed recently, at least for my types of queries. I only use Pro
  
  Reply View | 0 replies
- dahcryn 2 days ago
  
  when I want it to google stuff, I just use the deep research mode. Not as instant, but it googles a lot of stuff then
  
  Reply View | 0 replies
- qingcharles 3 days ago
  
  Sample of one here, but I get the exact opposite behavior. Flash almost never wants to search and I have to use Pro.
  
  Reply View | 0 replies
tgtweak 3 days ago

I find Gemini does the most searching (and the quickest... regularly pulls 70+ search results on a query in a matter of seconds - likely due to googlebot's cache of pretty much every page). Chatgpt seems to only search if you have it in thinking/research mode now.

Reply View | 0 replies

QuadrupleA 2 days ago

Been unhappy with the GPT5 series, after daily driving 4.x for ages (I chat with them through the API) - very pedantic, goes off on too many side topics, stops following system instructions after a few turns (e.g. "you respond in 1-3 sentences" becomes long bulleted lists and multiple paragraphs very quickly.

Much better feel with the Claude 4.5 series, for both chat and coding.

Reply View 14 replies

Hard_Space 2 days ago

> you respond in 1-3 sentences" becomes long bulleted lists and multiple paragraphs very quickly
This is why my heart sank this morning. I have spent over a year training 4.0 to just about be helpful enough to get me an extra 1-2 hours a day of productivity. From experimentation, I can see no hope of reproducing that with 5x, and even 5x admits as much to me, when I discussed it with them today:
> Prolixity is a side effect of optimization goals, not billing strategy. Newer models are trained to maximize helpfulness, coverage, and safety, which biases toward explanation, hedging, and context expansion. GPT-4 was less aggressively optimized in those directions, so it felt terser by default.
Share and enjoy!

Reply View | 5 replies
- kouteiheika 2 days ago
  
  > This is why my heart sank this morning. I have spent over a year training 4.0 to just about be helpful enough to get me an extra 1-2 hours a day of productivity.
  Maybe you should consider basing your workflows on open-weight models instead? Unlike proprietary API-only models no one can take these away from you.
  
  Reply View | 1 reply
  
  Hard_Space 2 days ago
  
  I have considered it, and it is still on the docket. I have a local 3090 dedicated to ML. Would be a fascinating and potentially really useful project, but as a freelancer, it would cost a lot to give it the time it needs.
  
  Reply View | 0 replies
- Angostura 2 days ago
  
  And how would GPT 5.0 know that, I wonder. I bet it’s just making stuff up.
  
  Reply View | 0 replies
- ComputerGuru 2 days ago
  
  You can’t ask GPT to assess the situation. That’s not the kind of question you can count on a an LLM to accurately answer.
  Playing with the system prompts, temperature, and max token output dials absolutely lets you make enough headway (with the 5 series) in this regard to demonstrably render its self-analysis incorrect.
  
  Reply View | 0 replies
- ziml77 2 days ago
  
  What kind of "training" did you do?
  
  Reply View | 0 replies
dahcryn 2 days ago

4.1 is great for our stuff at work. It's quite stable (doesn't change personality every month, and one word difference doesn't change the behaviour). IT doesn't think, so it's still reasonably fast.
Is there anything as good in the 5 series? likely, but doing the full QA testing again for no added business value, just because the model disappears, is just a hard sell. But the ones we tested were just slower, or tried to have more personality, which is useless for automation projects.

Reply View | 1 reply
- QuadrupleA 2 days ago
  
  Yeah - agreed, the initial latency is annoying too, even with thinking allegedly turned off. Feels like AI companies are stapling more and more weird routing, summarization, safety layers, etc. that degrade the overall feel of things.
  
  Reply View | 0 replies
anarticle 2 days ago

I also found this disturbing, as I used to use GPT for small worked out theoretical problems. In 5.2, the long list of repeated bulleted lists and fortune cookies was a negative for my use case. I replaced some of that use with Claude and am experimenting with LM studio and gpt-oss. It seemed like an obvious regression to me, but maybe people weren't using it that way.
For instance something simple like: "If I put 10kw in solar on my roof when is the payback given xyz price / incentive / usage pattern."
Used to give a kind of short technical report, now it's a long list of bullets and a very paternalistic "this will never work" kind of negativity. I'm assuming this is the anti-sycophant at work, but when you're working a problem you have to be optimistic until you get your answer.
For me this usage was a few times a day for ideas, or working through small problems. For code I've been Claude for at least a year, it just works.

Reply View | 0 replies
spprashant 2 days ago

I can never understand why it is so eager to generate walls of text. I have instructions to always keep the response precise and to the point. It almost seem like it wants to overwhelm you, so you give up and do your own research.

Reply View | 0 replies
mhitza 2 days ago

I often use ChatGPT without an account and ChatGPT 5 mini (which you get while logged out) might as well be Mistral 7b + web search. Its that mediocre. Even the original 3.5 was way ahead.

Reply View | 3 replies
- accrual 2 days ago
  
  I kinda miss the original 3.5 model sometimes. Definitely not as smart as 4o but wow was it impressive when new. Apparently I have a very early ChatGPT account per the recent "wrapped" feature.
  
  Reply View | 0 replies
- teaearlgraycold 2 days ago
  
  Really? I’ve found it useful for random little things.
  
  Reply View | 1 reply
  
  mhitza 2 days ago
  
  It is useful for quick information lookup when you're lacking the precise search terms (which is what I've often do). But the way I was chatting with the original chatgpt were better.
  
  Reply View | 0 replies

sundarurfriend 3 days ago

ChatGPT 5.2 has been a good motivator for me to try out other LLMs because of how bad it is. Both 5.1 and 5.2 have been downgrades in terms of instruction following and accuracy, but 5.2 especially so. The upside is that that's had me using Claude much more, and I like a lot of things about it, both in terms of UI and the answers. It's also gotten me more serious about running local models. So, thank you OpenAI, for forcing me to broaden my horizons!

Reply View 7 replies

johnsmith1840 3 days ago

I left my chatgpt pro subscription when they removed the true deep thinkibg methods.
Mostly because how massively varied their releases are. Each one required big changes to how I use and work with it.
Claude is perfect in this sense all their models feel roughly the same just smarter so my workflow is always the same.

Reply View | 1 reply
- Terretta 2 days ago
  
  > all their models feel roughly the same just smarter
  Substantial "applied outcomes" regression from 3.7 to 4 but they got right on fixing that.
  
  Reply View | 0 replies
orphea 3 days ago

Have you had a chance to compare with Gemini 3?

Reply View | 3 replies
- qingcharles 3 days ago
  
  I switch routinely between Gemini 3 (my main), Claude, GPT, and sometimes Grok. If you came up with 100 random tasks, they would all come out about equal. The issue is some are better at logical issues, some are better at creative writing, etc. If it's something creative I usually drop it in all 4 and combine the best bits of each.
  (I also use Deep Think on Gemini too, and to me, on programming tasks, it's not really worth the money)
  
  Reply View | 1 reply
  
  deaux 2 days ago
  
  This is the only accurate take. Any people who claim that one of the big 3 is all around "bad" or "low quality" compared to the other two, can be ignored. They're close enough in overall "strength" yet different enough in strengths/weakness that it's very much task/domain-specific.
  
  Reply View | 0 replies
- sundarurfriend 3 days ago
  
  Not extensively. The few interactions I've tried on it have been disappointing though. The Voice input is really bad, like significantly worse than any other major AI in the market. And I assumed search would be its strong suit and ran a search-and-compile type prompt (that I usually run on ChatGPT) on Gemini, and it was underwhelming at it. Not as bad as Grok (which was pretty much unusable for this), but noticeably worse than ChatGPT. Maybe Gemini has other strengths that I haven't come across yet, but on that one at least, it was
  ChatGPT 5 ~= Claude > ChatGPT 5.2 > Gemini >> Grok
  
  Reply View | 0 replies
PlatoIsADisease 3 days ago

nah bruh you are just imagining it.
Its just as good as ever /s

Reply View | 0 replies

shmel 3 days ago

Retiring the most popular model for the relationship roleplay just one day before the Valentin's day is particularly ironic =) bravo, OpenAI!

Reply View 3 replies

zamadatix 2 days ago

It'd be legitimately funny if they released "Adult version" ChatGPT on Valentine's day.

Reply View | 0 replies
moeffju 2 days ago

Valentine's is in mid February

Reply View | 1 reply
- lifetimerubyist 2 days ago
  
  The sunset date is the 13th. V-day is on the 14th.
  
  Reply View | 0 replies

simonw 3 days ago

> [...] the vast majority of usage has shifted to GPT‑5.2, with only 0.1% of users still choosing GPT‑4o each day.

Reply View 16 replies

fpgaminer 3 days ago

Well yeah, because 5.2 is the default and there's no way to change the default. So every time you open up a new chat you either use 5.2 or go out of your way to select something else.
(I'm particularly annoyed by this UI choice because I always have to switch back to 5.1)

Reply View | 4 replies
- xiphias2 2 days ago
  
  I'm the same with o3.
  Also it's full of bugs, showing JSON all the time while thinking. But still it's my favorite model, so I'm switching back a lot.
  
  Reply View | 0 replies
- arrowsmith 3 days ago
  
  What about 5.1 do you prefer over 5.2?
  
  Reply View | 2 replies
  
  fpgaminer 3 days ago
  
  As far as I can tell 5.2 is the stronger model on paper, but it's been optimized to think less and do less web searches. I daily drive Thinking variants, not Auto or Instant, and usually want the _right_ answer even if it takes a minute. 5.1 does a very good job of defensively web searching, which avoids almost all of its hallucinations and keeps docs/APIs/UIs/etc up-to-date. 5.2 will instead often not think at all, even in Thinking mode. I've gotten several completely wrong, hallucinated answers since 5.2 came out, whereas maybe a handful from 5.1. (Even with me using 5.2 far less!)
  The same seems to persist in Codex CLI, where again 5.2 doesn't spend as much time thinking so its solutions never come out as nicely as 5.1's.
  That said, 5.1 is obviously slower for these reasons. I'm fine with that trade off. Others might have lighter workloads and thus benefit more from 5.2's speed.
  
  Reply View | 1 reply
  
  Terretta 2 days ago
  
  This is a terrible thing to say out loud*, but, in all such cases I'd rather just give them the more money to do the better answers.
  It boggles the mind that "wrong answers only" is no longer just a meme, it's considered a valid cost management strategy in AI.
  * Because if they realize we're out here, they'll price discriminate, charging extra for right answers.
  
  Reply View | 0 replies
adamiscool8 3 days ago

0.1% of users is not necessarily 0.1% of conversations…

Reply View | 0 replies
SecretDreams 3 days ago

What's the default model when a random user goes to use the chatgpt website or app?

Reply View | 6 replies
- mrec 3 days ago
  
  5.2 in the website. You can see what was used for a specific response by hovering over the refresh icon at the end.
  
  Reply View | 0 replies
- bananaflag 3 days ago
  
  5.2.
  You can go to chatgpt.com and ask "what model are you" (it doesn't hallucinate on this).
  
  Reply View | 3 replies
  
  SecretDreams 3 days ago
  
  Probably a relationship between what's the default and what model is being used the most. It is more about what OAI sets than what users care about. Flip side is "good enough is good enough" for most users.
  
  Reply View | 0 replies
  
  johndough 3 days ago
  
  > (it doesn't hallucinate on this)
  But how do we know that you did not hallucinate the claim that ChatGPT does not hallucinate its version number?
  We could try to exfiltrate the system prompt which probably contains the model name, but all extraction attempts could of course be hallucinations as well.
  (I think there was an interview where Sam Altman or someone else at OpenAI where it was mentioned that they hardcoded the model name in the prompt because people did not understand that models don't work like that, so they made it work. I might be hallucinating though.)
  
  Reply View | 1 reply
  
  razodactyl 3 days ago
  
  Confabulating* If you were hallucinating we would be more amused :)
  
  Reply View | 0 replies
- AlexeyBrin 3 days ago
  
  On the paid version it is 5.2.
  
  Reply View | 0 replies
lifetimerubyist 3 days ago

won't somebody think of the goonettes?!

Reply View | 2 replies
- deciduously 3 days ago
  
  This was not a word I was prepared to learn about today.
  
  Reply View | 1 reply
  
  navigate8310 3 days ago
  
  https://old.reddit.com/r/myboyfriendisai/top?t=all
  
  Reply View | 0 replies

raymond_goo 2 days ago

Gemini, Claude, ChatGPT or whatever. Can we all agree, that it's great to have so much choice?

Reply View 2 replies

thefourthchime 2 days ago

Grok is pretty good too!

Reply View | 0 replies
mizuki_akiyama 2 days ago

You’re absolutely right!

Reply View | 0 replies

jostmey 3 days ago

I noticed how ChatGPT got progressively worse at helping me with my research. I gave up on ChatGPT 5 and just switched Grok and Gemini. I couldn’t be happier that I switched.

Reply View 18 replies

azan_ 3 days ago

It's amazing how different are the experiences different people have. To me every new version of chatgpt was an improvement and gemini is borderline unusable.

Reply View | 12 replies
- farcitizen 3 days ago
  
  I got the same experience. Dont get how people are saying gemini is so good.
  
  Reply View | 2 replies
  
  0xbadcafebee 3 days ago
  
  A lot of people still have a shallow understanding of how LLMs work. Each version of a model has different qualities than the last, each model is better or worse at some things than others, and each responds differently to different prompts, styles. Some smaller models perform better than larger ones. Sometimes you should use a system prompt, sometimes you shouldn't. Tuning settings for the model inference (temperature, top_p, penalties, etc) significantly influence the outcome. (https://www.promptingguide.ai/introduction/settings, https://platform.openai.com/docs/guides/optimizing-llm-accur...)
  Most "big name" models' interfaces don't let you change settings, or not easily. Power users learn to use different interfaces and look up guides to tweak models to get better results. You don't have to just shrug your shoulders and switch models. OpenAI's power interface: https://platform.openai.com/playground Anthropic's power interface: https://platform.claude.com/ For self-hosted/platform-agnostic, OpenWebUI is great: https://openwebui.com/
  
  Reply View | 0 replies
  
  europeanNyan 3 days ago
  
  Gemini has a great model, but it's a bad product. I feel much happier using ChatGPT because Gemini just seems so barebones and unpolished. It has this feeling of a tech demo.
  
  Reply View | 0 replies
- tgtweak 3 days ago
  
  Very curious for what use cases you're finding gemini unusable.
  
  Reply View | 8 replies
  
  azan_ 3 days ago
  
  Scientific research and proof-reading. Gemini is the laziest LLM I've used. Frequently he will lie that he searched for something and just make stuff up, basically never happens to me when I'm using gpt5.2.
  
  Reply View | 4 replies
  
  wltr 2 days ago
  
  Any coding task produces some trash, while I can prototype with ChatGPT quite a lot, sometimes delivering the entire app almost entirely vibe-coded. Gemini, it takes a few prompts for it to get me mad and just close the tab. I use only the free web versions, never agentic ‘mess with my files’ thing. Claude, is even better than that, but I keep it for serious tasks only, so good it is.
  
  Reply View | 0 replies
  
  double0jimb0 3 days ago
  
  In my experience with Gemini, I find it incapable of not hallucinating.
  
  Reply View | 0 replies
  
  subscribed 2 days ago
  
  Gemini loves to ignore Gemini.md instructions from the first minutes, to replace half of the python script with "# other code...", or to try to delete files OUTSIDE of the project directory, then apologise profusely, and try it again.
  Utterly unreliable. I get better results, faster, editing parts of the code with Claude in a web ui, lol.
  
  Reply View | 0 replies
mmcwilliams 2 days ago

Odd, I've found that Gemini will completely fabricate the content of specific DOIs despite being corrected and even it providing a link to a paper which shows it is off about the title and subject of a paper it will cite. This obviously concerns me about its effectiveness as a research aide.

Reply View | 0 replies
amelius 3 days ago

Why not Claude?

Reply View | 3 replies
- esperent 3 days ago
  
  The limits on the $20 plan are too low compared to Gemini and ChatGPT. They're too low to do any serious work at all.
  
  Reply View | 0 replies
- jostmey 3 days ago
  
  I personally find Claude the best at coding, but it’s usefulness doesn’t seem to extend to scientific research and writing
  
  Reply View | 0 replies
- 650REDHAIR 3 days ago
  
  Because I’m sick of paying $20 for an hour of claude before it throttles me.
  
  Reply View | 0 replies

flanked-evergl 2 days ago

I used https://openrouter.ai/openai/gpt-4.1 for grammar checking, it was great. No newer ChatGPT models came close to being as responsive and good. ChatGPT 5.2 thinks I want it to write essays about grammar.

Any suggestions?

Reply View 0 replies