Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT
(openai.com)296 points by rd 3 days ago
296 points by rd 3 days ago
/r/MyBoyfriendIsAI https://www.reddit.com/r/MyBoyfriendIsAI/ is a whole thing. It's not a joke subreddit.
The range of attitudes in there is interesting. There are a lot of people who take a fairly sensible "this is interactive fiction" kind of attitude, and there are others who bristle at any claim or reminder that these relationships are fictitious. There are even people with human partners who have "married" one or more AIs.
IIRC you'll get modded or banned for being critical of the use case. Which is their "right", but it's freaking weird.
Yes. My experience is that it doesn't require scolding, mocking, or criticizing anyone to get permabanned. Just being up front about the fact that you have concerns about the use case is enough for a permaban, even if you only bring that up in order to demonstrate that such a position does not stem from contempt for LLM-as-companion users. :-\
do you think they know they're just one context reset away from the llm not recognizing them at all and being treated like a stranger off the street? For someone mentally ill and somehow emotionally attached to the context it would be... jarring to say the least.
And it's a pity that this highly prevalent phenomenon (to exaggerate a bit, probably the way tech in general will become the most influential in the next couple years) is barely mentioned on HN.
I dunno. Tbf that subreddit has a combination of
- a large number of incredibly fragile users
- extremely "protective" mods
- a regular stream of drive-by posts that regulars there see as derogatory or insulting
- a fair amount of internal diversity and disagreement
I think discussion on forums larger than it, like HN or popular subreddits, is likely to drive traffic that will ultimately fuel a backfiring effect for the members. It's inevitable, and it's already happening, but I'm not sure it needs to increase.I do think the phenomenon is a matter of legitimate public concern, but idk how that can best be addressed. Maybe high-quality, long form journalism? But probably not just cross-posting the sub in larger fora.
Part of me thinks maybe I erred bringing this up, but there's discussions worth having in terms of continued access to software that's working for people regardless of what it is, and on if this is healthy. I'm probably on a live and let live on this but there's been cases of suicide and murder where chatbots were involved, and these people are potentially vulnerable to manipulation from the company.
Any sub that is based on storytelling or reposting memes, videos etc. are karma farms and lies.
Most subs that are based on politics or current events are at best biased, at worst completely astroturf.
The only subs that I think still have mostly legit users are municipal subs (which still get targeted by bots when anything political comes up) and hobby subs where people show their works or discuss things.
I sometimes envy the illiterate.
At least they cannot read this.
I wonder if they have run the analytics on how many users are doing that. I would love to see that number.
> only 0.1% of users still choosing GPT‑4o each day.
If the 800MAU still holds, that's 800k people.
It's a growing market, although it might be because of shifting goal posts. I had a friend whose son was placed in French immersion (a language he doesn't speak at all). From what I was understanding, he was getting up and walking around in kindergarten and was labelled as mentally divergent; his teachers apparently suggested to his mother that he see a doctor.
(Strangely these "mental illnesses" and school problems went away after he switched to an English language school, must be a miracle)
I assume the loneliness epidemic is producing similar cases.
> I had a friend whose son was placed in French immersion (a language he doesn't speak at all).
In my entire french immersion Kindergarden class, there was a total of one child who already spoke French. I don't think the fact that he didn't speak the language is the concern.
They control reddit and used to control twitter.
There is/was an interesting period where "normies" were joining twitter en-masse, and adopted many of the denizens ideas as normal widespread ideas. Kinda like going on a camping trip at "the lake" because you heard it's fun and not realizing that everyone else on the trip is part of a semi-deranged cult.
The outsized effect of this was journalists thinking these people on twitter were accurate representations of what society on the whole was thinking.
wasn't there a trend on twitter to have a bio/signature with a bunch of mental illness acronyms?
There will be a lot of mentally unwell people unhappy with this, but this is a huge net positive decision, thank goodness.
I actually tried GPT 4.1 for the first time a few hours ago(1).
I spent about half an hour trying to coax it in "plan mode" in IntelliJ, and it kept spitting out these generic ideas of what it was going to do, not really planning at all.
And when I asked it to execute the plan.. it just created some generic DTO and said "now all that remains is <the entire plan>".
Absolutely worst experience with an AI agent so far, not to say that my overall experience has been terrific.
1) Our plan for Claude Opus 4.5 "ran out" or something.
> In the API, there are no changes at this time
Curios where this is going to go.
One of the big arguments for local models is we can't trust providers to maintain ongoing access the models you validated and put into production. Even if you run hosted models, running open ones means you can switch providers.
If they were to retire gpt 4.1 series from API that would be a major deal breaker. For structured outputs it is more predictable and significantly better because it does not have the reasoning step baked in.
I've heard great things about the mixtral structured outputs capabilities but haven't had a chance to run my evals on them.
If 4.1 is dropped from API that's the first course of action.
Also 5 series doesn't have fine tuning capabilities and it's unclear how it would work if the reasoning step is involved
will this nuke my old convos?
opus 4.5 is better at gpt on everything except code execution (but with pro you get a lot of claude code usage) and if they nuke all my old convos I'll prob downgrade from pro to freee
Not that I’m aware. Models can be fairly seamlessly switched even mid-conversation, so this is unlikely to affect history.
I wish they would keep 4.1 around for a bit longer. One of the downsides of the current reasoning based training regimens is a significant decrease in creativity. And chat trained AIs were already quite "meh" at creative writing to begin with. 4.1 was the last of its breed.
So we'll have to wait until "creativity" is solved.
Side note: I've been wondering lately about a way to bring creativity back to these thinking models. For creative writing tasks you could add the original, pretrained model as a tool call. So the thinking model could ask for its completions and/or query it and get back N variations. The pretrained model's completions will be much more creative and wild, though often incoherent (think back to the GPT-3 days). The thinking model can then review these and use them to synthesize a coherent, useful result. Essentially giving us the best of both worlds. All the benefits of a thinking model, while still giving it access to "contained" creativity.
My theory, based on what I would see with non-thinking models, is that as soon as you start detailing something too much (ie: not just "speak in the style of X" but more like "speak in the style of X with [a list of adjectives detailing the style of X]" they would loose creativity, would not fit the style very well anymore etc. I don't know how things have evolved with new training techniques etc. but I suspected that overthinking their tasks by detailing too much what they have to do can lower quality in some models for creative tasks.
I also terribly regret the retirement of 4.1. From my own personal usage, for code or normal tasks, I clearly noticed a huge gap in degraded performance between 4.1 and 5.1/5.2.
4.1 was the best so far. With straight to the point answers, and most of the time correct. Especially for code related questions. 5.1/5.2 on their side would a lot more easily hallucinate stupid responses or stupid code snippet totally not what was expected.
From the blog post (twice):
> creative ideation
At first I had no idea what this meant! So I asked my friend Miss Chatty [1] and we had an interesting conversation about it:
https://chatgpt.com/share/697bf761-990c-8012-9dd1-6ca1d5cc34...
[1] You may know her as ChatGPT, but I figure all the other AIs have fun human-sounding names, so she deserves one too.
I do find it interesting to see how people interact with AI as I think it is quite a personal preference. Is this how you use AI all the time? Do you appreciate the sycophancy, does it bother you, do you not notice it? From your question it seems you would prefer a blog post in plainer language, avoiding "marketing speak", but if a person spoke to me like Miss Chatty spoke to you I would be convinced I'm talking to a salesperson or marketing agent.
That is a great question!
You are absolutely right to ask about it!
(How did I do with channeling Miss Chatty's natural sycophancy?)
Anyway, I do use AI for other things, such as...
• Coding (where I mostly use Claude)
• General research
• Looking up the California Vehicle Code about recording video while driving
• Gift ideas for a young friend who is into astronomy (Team Pluto!)
• Why "Realtor" is pronounced one way in the radio ads, another way by the general public
• Tools and techniques for I18n and L10n
• Identifying AI-generated text and photos (takes one to know one!)
• Why spaghetti softens and is bendable when you first put it into the boiling water
• Burma-Shave sign examples
• Analytics plugins for Rails
• Maritime right-of-way rules
• The Uniform Code of Military Justice and the duty to disobey illegal orders
• Why, in a practical sense, the Earth really once *was* flat
• How de-alcoholized wine gets that way
• California law on recording phone conversations
• Why the toilet runs water every 20 minutes or so (when it shouldn't)
• How guy wires got that name
• Where the "he took too much LDS" scene from Star Trek IV was filmed
• When did Tim Berners-Lee demo the World Wide Web at SLAC
• What "ogr" means in "ogr2ogr"
• Why my Kia EV6 ultrasonic sensors freaked out when I stopped behind a Lucid Air
• The smartest dog breeds (in different ways of "smart")
• The Sputnik 1 sighting in *October Sky*
• Could I possibly be related to John White Geary?
And that's just from the last few weeks.In other words, pretty much anything someone might interact with an AI - or a fellow human - about.
About the last one (John White Geary), that discussion started with my question about actresses in the "Pick a little, talk a little" song from The Music Man movie, and then went on to how John White Geary bridged the transition from Mexican to US rule, as did others like José Antonio Carrillo:
https://chatgpt.com/share/697c5f28-7c18-8012-96fc-219b7c6961...
If I could sum it all up, this is the kind of freewheeling conversation with ChatGPT and other AIs that I value.
It's really an interesting insight into people's personalities. Far more than their Google search history. Which is why everyone wants their GPT chats burned to the ground after they die.
Oh good. Not in the API. The 4o-mini is super cheap and useful for a bunch of things I do (evaluating post vector-search for relevancy).
Anyone knows if finetuned models using gpt-4 are getting retired as well ?
What about the Advanced Voice feature, has this been updated to 5.x models yet?
I think this kind of thing is a pretty strong argument for the entire open source model ecosystem, not just open weights but open data and the whole gamut.
I hope they won't chop gpt-4o-mini soon because it's fast and accurate for API usage.
I have stopped using ChatGPT in favor of Gemini. Mostly you need LLMs for factual stuff and sometimes to draft bits of code here and there. I use Google with Gemini for the first part and I am a huge fan of codex for the second part.
OK, everyone is (rightly) bringing up that relatively small but really glaringly prominent AI boyfriend subreddit.
But I think a lot more people are using LLMs for relationship surrogates than that (pretty bonkers) subreddit would suggest. Character AI (https://en.wikipedia.org/wiki/Character.ai) seems quite popular, as do the weird fake friend things in Meta products, and Grok’s various personality mode and very creepy AI girlfriends.
I find this utterly bizarre. LLMs are peer coders in a box for me. I care about Claude Code, and that’s about it. But I realize I am probably in the vast minority.
We're very echo-chambered here. That graph OpenAI released had coding at 4% or something.
I remembered it being higher, but you are correct. All “technical help” (coding + data analysis + math) is 7.5%, with coding only being 4.2% as of June 2025 [0]. Note that there is a separate (sub)category of seeking information -> specific information that’s at 18.3%, I presume this could include design and architecture questions that don’t involve code, but I could be wrong.
[0]: https://www.nber.org/system/files/working_papers/w34255/w342...
Why would someone want to spend half a million dollars on GPUs and components (if not more) to run one year old models that genuinely aren't useful? You can't self host trillion parameter models unless you own a datacenter lol (or want to just light money on fire).
Everyone keeps saying that but I’ve found it to be incredibly weak in the real world every single time I’ve reached for it. I think it’s benchmaxxed to an extent.
> with only 0.1% of users still choosing GPT‑4o each day.
LOL WHAT?! I'm 0.1% of users? I'm certain part of the issue is it takes 3-clicks to switch to GPT-4o and it has to be done each time the page is loaded.
> that they preferred GPT‑4o’s conversational style and warmth.
Uh.. yeah maybe. But more importantly, GPT-4o gave better answers.
Zero acknowledgement about how terrible GPT-5 was when it was first released. It has since improved but it's not clear to me it's on-par with GPT-4o. Thinking mode is just too slow to be useful and so GPT-4o still seems better and faster.
Oh well, it'll be missed.
I agree - I use 4o via the API, simply because it answers so quickly. Its answers are usually pretty good on programming topics. I don't engage in chit-chat with AI models, so it's not really about the personality (which seems to be the main framing people are talking about), just the speed.
Does this mean they're also retiring Standard Voice Mode?
5.2 is back to being a sycophantic hallucinating mess for most use cases - I've anecdotally caught it out on many of the sessions I've had where it apologizes "You're absolutely right... that used to be the case but as of the latest version as you pointed out, it no longer is." when it never existed in the first place. It's just not good.
On the other hand - 5.0-nano has been great for fast (and cheap) quick requests and there doesn't seem to be a viable alternative today if they're sunsetting 5.0 models.
I really don't know how they're measuring improvements in the model since things seem to have been getting progressively worse with each release since 4o/o4 - Gemini and Opus still show the occasional hallucination or lack of grounding but both readily spend time fact-checking/searching before making an educated guess.
I've had chatgpt blatantly lie to me and say there are several community posts and reddit threads about an issue then after failing to find that, asked it where it found those and it flat out said "oh yeah it looks like those don't exist"
That’s been my experience and has lead to hours of wasted time. It’s faster for me to read through docs and watch YouTube.
Even if I submit the documentation or reference links they are completely ignored.
That’s really going to upset the crazies.
Despite 4o being one of the worst models on the market, they loved it. Probably because it was the most insane and delusional. You could get it to talk about really fucked up shit. It would happily tell you that you are the messiah.
The reaction to its original removal on Instagram Reels, r/ChatGPT, etc., was genuinely so weird and creepy. I didn't realise before this how many people had genuine parasocial (?) relationships with these LLMs.
I was mostly using 4o for academic searches and planning. It was the best model for me. Based on the context I was giving and questions I was asking, 4o was the most the consistent model.
It used to get things wrong for sure but it was predictable. Also I liked the tone like everyone else. I stopped using ChatGPT after they removed 4o. Recently, I have started using the newer GPT-5 models (got free one month). Better than before but not quite. Acts way over smart haha
I wonder if it will still be up on Azure? How much you think I can make if I setup 4o under a domain like yourgirlfriendis.ai or w/e
Note: I wouldnt actually, I find it terrible to prey on people.
ChatGPT Made Me Delusional: https://www.youtube.com/watch?v=VRjgNgJms3Q
Should be essential watching for anyone that uses these things.
I still don’t know how openAI thought it was a good idea to have a model named "4o" AND a model named "o4", unless the goal was intentional confusion
Even ChatGPT (and certainly Google) confuses the names.
I'm sure there is some internal/academic reason for them, but from an outside observer simply horrible.
How many times have you noticed people confusing the name itself: ChatGBT, ChatGTP etc.
We're the technical crowd cursed and blinded by knowledge.
I still don't like how French people don't call it "chat j'ai pété".
It's almost always marketing and some stupid idea someone there had. I don't know why non-technical people try and claim so much ownership over versioning. You nearly always end up with these ridiculous outcomes.
"I know! Let's restart the version numbering for no good reason!" becomes DOOM (2016), Mortal Kombat 1 (2025), Battlefield 1 (2016), Xbox One (not to be confused with the original Xbox 1)
As another example, look at how much of a trainwreck USB 3 has become
Or how Nvidia restarted Geforce card numbering
They will have to update the openai. Com footer I guess
Latest Advancements
GPT-5
OpenAI o3
OpenAI o4-mini
GPT-4o
GPT-4o mini
Sora
Last time they tried to do this they got huge push back from the AI boyfriend people lol