Comment by bambax

Comment by bambax 5 months ago

This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

cardanome 5 months ago

> Sabine Hossenfelder

She really needs to stop commenting on topics outside of theoretical physics.

Even in physics she does not represent the scientific consensus but has some very questionable fringe beliefs like labeling whole sub-fields as "scams to get funding".

She regularly speaks with "scientific authority" about topics she barely knows anything about.

Her video on autism is considered super harmful and misleading by actual autistic people. She also thinks she is an expert on trans-issues and climate change. And I doubt she know enough about artificial intelligence and computer science to comment on LLMs.

Reply View 42 replies

Mekoloto 5 months ago

Your statement is missleading.
She doesn't say she is an expert on trans-issues at all! She analyzed the studies and looked at data and stated that there is no real transpendemic but highlighed an statistical increased numbers in young woman without stating a clear opinion on this finding.
The climate change videos do the same thing. She evaluates these studies discusses them to clarify that for her, certain numbers are unspecific and she also is not coming to a clear conclusion in sense of climate change yes, no, bad, good.
She is for sure not an expert in all fields, but her way of discussing these topics are based on studies, numbers and is a good viewpoint.
The funding scam you mention is a reference of "these people get billions for particle research but the outcome for us as society is way to small"

Reply View | 29 replies
- cardanome 5 months ago
  
  Having studied physics does not allow you to evaluate studies in completely unrelated field in any meaningful way.
  Especially not in such politically-charged fields that require deeper knowledge about the historical context, the different interest groups and their biases and so on.
  Her video on trans-issues labels people that advocate for the rights of trans-people as "extremists" and presents transphobic talking points as valid part of the scientific discussion.
  Her trying to appear "neutral" and "just presenting the science" is exactly the issue. Using her authority as a scientist when talking about topics she has no expertise in.
  Here is a debunking of her video on trans-issues: https://www.youtube.com/watch?v=r6Kau7bO3Fw
  Here is a longer criticism of her video on autism: https://www.youtube.com/watch?v=vaZZiX0veFY
  
  Reply View | 14 replies
  
  bflesch 5 months ago
  
  So where does your "scientific authority" come from, which is needed before criticizing someone according to your own logic?
  You're not even using your real name here. Nobody knows if you have any scientific qualifications, or a university degree at all.
  
  Reply View | 10 replies
  
  mistercheph 5 months ago
  
  > Having studied physics does not allow you to evaluate studies in completely unrelated field in any meaningful way.
  I agree! Before one may touch the pink sceptre, they must be permitted through the gate, and kissed by the doddling sheep, Harry, who will endow them with permission to pass and comment on many a great manor of thing which are simply out of reach of the natural human mind without these great blessings which we bestow. And, amen.
  
  Reply View | 0 replies
  
  [removed] 5 months ago
  
  [deleted]
  
  Reply View | 0 replies
  
  kordlessagain 5 months ago
  
  Looking at this HN commentator's behavior, we can see the early stages of a troubling pattern:
  They start by attacking a physicist for being "neutral" and "just presenting the science" - exactly the kind of delegitimization of objectivity we see in early stages of information control Notice how they frame staying neutral as actively harmful - it's not just "wrong," but presented as dangerous because it doesn't take a strong enough stance against what they view as "extremist" positions Most tellingly, they're not arguing that her analysis is incorrect. Their complaint is that she's even allowing certain viewpoints to be examined objectively at all.
  This maps directly to historical patterns where:
  1. First you attack individuals for being neutral
  2. Then you establish that certain topics are "beyond" neutral analysis
  3. Finally you create an environment where examining data objectively becomes seen as suspicious or harmful
  This HN comment is a perfect micro-example of this - it's not even sophisticated gatekeeping, it's raw "how dare you look at this objectively when you should be taking my side." This kind of thinking, multiplied across society and amplified by modern media, is exactly how larger patterns of information control take hold.
  
  Reply View | 0 replies
- jiggawatts 5 months ago
  
  > The funding scam you mention is a reference of "these people get billions for particle research but the outcome for us as society is way to small"
  More specifically, even particle physicists admit that a 2x or even a 10x bigger accelerator is not expected to find anything fundamentally new.
  The core criticism is that it has become a self-licking ice cream cone that serves no real purpose other than keeping physicists employed.
  
  Reply View | 0 replies
- bccdee 5 months ago
  
  > for her, certain numbers are unspecific and she also is not coming to a clear conclusion in sense of climate change yes, no, bad, good.
  Climate chance is settled science. To claim that "certain numbers are unspecific, so I can't say whether climate change is real or not, or whether it's good or bad" (which, based on your paraphrasing, is what it sounds like she said) is an unacceptable position. It's muddying the waters.
  I'm not going to go watch her content about trans people, but it sounds like the same thing: Muddying the waters by Just Asking Questions about anti-trans "social contagion" talking points.
  ---
  EDIT: Okay I went back and watched some clips of her anti-trans video. She takes a pseudoscientific theory based on an opinion poll of parents active on an anti-trans web forum and suggests we take it seriously because "there is no conclusive evidence for or against it," as if the burden of proof weren't on the party making the positive claim, and as if the preponderance of evidence and academic consensus didn't overwhelmingly weigh against it. It's textbook laundering of pseudoscience. You've significantly misrepresented her position.
  
  Reply View | 12 replies
  
  toolz 5 months ago
  
  There's no such thing as "settled science". You can not prove that any scientific consensus has no flaws in the same way you can't prove the absence of bugs in any software. It's unproductive to treat science as anything more than an ongoing, constantly improving process.
  
  Reply View | 8 replies
  
  Mekoloto 5 months ago
  
  [dead]
  
  Reply View | 0 replies
  
  pyinstallwoes 5 months ago
  
  It’s not settled.
  
  Reply View | 1 reply
  
  Mekoloto 5 months ago
  
  [dead]
  
  Reply View | 0 replies
dimal 5 months ago

> Her video on autism is considered super harmful and misleading by actual autistic people.
I’m autistic and I just watched her video. I found it to be one of the best primers on autism I’ve seen. Not complete, of course, and there’s a lot more nuance to it, but very even handed. She doesn’t make any judgements. She just gives the history and talks about the controversies without choosing sides, except to say that the idea of neurodiversity seems reasonable to her. When compared to most of the discourse about autism, it stands up pretty well. Of course, there’s a lot more I want people to know about autism, but it’s an ok start.
Actually, many autistic people (myself included) would find your statement far more harmful. You assume that all autistic people think alike and believe the same thing. This is false. You try to defend us without understanding us.
Don’t do that.
I suppose there’s a possibility that you’re autistic and found it harmful to you. If so, don’t speak for me.
And she was commenting on an AI’s knowledge of Bell’s Inequality, which is PHYSICS. If she can’t comment on that, who can?

Reply View | 2 replies
- cardanome 5 months ago
  
  There is a misunderstanding: I did not specify that ALL autistic people think like this. Just that autistic people found it harmful and misleading. There is quite a lot of autistic content creators criticizing the video. It does not mean every autistic person needs to feel the same way.
  I am neurodivergent myself but (probably) not autistic. The first time I watched the video I actually didn't think that it was that bad and had a similar reaction to you. But once I started to think about it and educate myself more on the topic I realized how bad it is.
  Sure it is not the worst video on autism but it still promotes some really bad ableist views.
  Autism speaks it is a horrible hate organization. I don't think there is a spoiler tag here so please skip this paragraph if ableism is triggering to you but there is a video of the autism speaks founder where she talks about how she at one point wanted to kill herself and her autistic child because she couldn't cope with having an autistic child and only didn't do it because of her other non-autistic child. She says that while her autistic child is in the background.
  I also didn't know about "aspie supremacy" and why people still use the term "asperger" despite being outdated. Hans Asperger was a Nazi Scientist who is responsible for killing thousands of children. He thought some autistic children might be useful as future scientists for the Nazi regime so assigned them the diagnoses "asperger syndrome" while the other autistic children were to be murdered.
  I recommend you watch Ember Green on this topic: https://www.youtube.com/watch?v=vaZZiX0veFY
  
  Reply View | 1 reply
  
  dimal 5 months ago
  
  You said "Her video on autism is considered super harmful and misleading by actual autistic people".
  While you didn't say ALL, you didn't clarify, so your wording says that autistic people categorically think her video is super harmful and misleading. That's simply not true.
  I'm in a bit of a minority in that I see autism through the neurodiversity lens, but I also think this tribal us-vs-them mentality is doing us more harm than good. Tarring and feathering people for not getting everything exactly "right" by my standards isn't helping anyone. It just causes people to throw up their hands and vote Trump.
  So, while I disagree with Autism Speaks on pretty much every point and I think they're extremely harmful, labeling them as a hate group is self-defeating. Parents with high-needs autistic children look to Autism Speaks. These parents love their children, but are being mislead. When you push people, they push back. If we shout "You're a hate group!" then all dialogue stops, and we can't help their children. And helping those children the important thing.
  Ironically, I think the problem for Sabine is simply miscommunication, which is due to her probably being autistic and not communicating according to neurotypical standards. She ended the video by showing how she scored high on an online test (which isn't definitive, of course), but then she dismisses it out of hand.
  I dismissed those tests too when I first took them, but that's because I still didn't really know what autism is, even after I did tons of research. I couldn't really know that by reading studies or talking to a psychologist. I didn't really know until I met other autistic people and realized that they actually "got" me in a way that no one else ever has.
  Her behavior is VERY typical of autistic people. Monotone voice, hyperlogical, hypermoral. She quit a successful career in physics for moral reasons. No neurotypical person would do that. An autistic person would. She wears the same velour shirt in every video. Sensory issues(?), repetitive behavior.
  It's up to her to make the call as to whether she's actually autistic or not, but I see her as one of us.
  And... I'm going to need a TLDR on that Ember Green video. It's a 2 hour commentary on a 25 minute video. Instead of asking me to unpack the arguments, make them.
  
  Reply View | 0 replies
dauertewigkeit 5 months ago

I agree with you that Sabine often talks about matters far outside of her expertise, but as somebody with a foot in academia, I would bet that a very large number of academics have at least one academic research direction in mind that they would categorize as a "scam to get funding".

Reply View | 0 replies
hitekker 5 months ago

I was nodding along to your comment, at first. But then I read your follow-ups, which look like you're covering something up that you fear could be true.
I don't know what that something is, so I think I should go listen to Sabine Hossenfelder.

Reply View | 1 reply
- cardanome 5 months ago
  
  You got me, I am one of those rabble rousing extremists that believe in human rights and climate change and trans people having the right to exist. Shocking!
  Seriously, let me know once you figured out what i am "covering up".
  
  Reply View | 0 replies
netbioserror 5 months ago

Based on what I've seen of Sabine, virtually all of this post is lies. She regularly positions herself as an outside skeptic and critic. Do you have any examples or her claiming authority or representing consensus?

Reply View | 0 replies
me_me_me 5 months ago

but Bell Theorem IS physics! so according to you she absolutely can comment on LLMs understanding of physics or lack of it.
so your whole rant makes no sense

Reply View | 2 replies
- dambi0 5 months ago
  
  There is a difference between the question “LLMs don’t understand Bells Theorem what does this tell us about physics” and “LLMs don’t understand Bells Theorem what does this tell us about LLMs”.
  
  Reply View | 1 reply
  
  me_me_me 5 months ago
  
  [dead]
  
  Reply View | 0 replies
Der_Einzige 5 months ago

She’s very full of shit and feels a lot like a lex Friedman for women.
I can’t wait for others to call her further out for being herself the biggest grifter of them all, bigger than most she tries to take down.

Reply View | 0 replies
idiotsecant 5 months ago

Yes, she is the worst type of 'vibes based' science communicator and mainly just says edgy things to improve click rate and drive engagement.

Reply View | 0 replies

ttoinou 5 months ago

   Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks

That's something I always wondered about, Goodhart's law is so obvious to apply to each new AI release. Even the fact that writers and journalists don't mention that possibility makes me instantly skeptical about the quality of the article I'm reading

Reply View 4 replies

NitpickLawyer 5 months ago

> Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks
2 anecdotes here:
- just before grok2 was released, they put it on livearena under a pseudonim. If you read the topics (reddit,x,etc) when that hit, everyone was raving about the model. People were saying it's the next 4o, that it's so good, hyped, so on. Then it launched, and they revealed the pseudonim, and everyone started shitting on it. There is a lot of bias in this area, especially with anything touching bad spaceman, so take "many people doubt" with a huge grain of salt. People be salty.
- there are benchmarks that seem to correlate very well with end to end results on a variety of tasks. Livebench is one of them. Models scoring highly there have proven to perform well on general tasks, and don't feel like they cheated. This is supported by the finding in that paper that found models like phi and qwen to lose ~10-20% of their benchmarks scores when checked against newly-built, unseen but similar tasks. Models scoring strongly on livebench didn't see that big of a gap.

Reply View | 3 replies
- staticman2 5 months ago
  
  I found arena was a place with a 2000 token limit on inputs.
  I think it even quietly eliminates the input without telling you. Nobody is putting serious work tasks in 2000 tokens on Arena.
  The lesson you should have learned is Arena is a dumb metric, not that people have unfounded biases against Grok 2. (Which I've used on Perplexity and found to be unimpressive.)
  The other thing is dumb, low quality statements are all over reddit and Twitter about any "hype" topic, including mysterious new models on arena. So it isn't surprising you encountered that for Grok 2, but you could have said the same thing for Gemini models.
  If reddit can be believed, Wizard LM 2 was so much better than OpenAI models that Microsoft had to cancel it so OpenAI wouldn't be driven out of business.
  People say all sorts of dumb stuff on social media.
  
  Reply View | 0 replies
- Mekoloto 5 months ago
  
  I'm following AI news and models for few years now and i have not read about your Grok2 controversy.
  Nonetheless, i do not use grok and i do not try it out due to it being part of Musk.
  I'm also not aware that Grok2 was communicated as the top model in any relevant timespan at all. Perhaps it just didn't deliver? Or a lot more people are not awaare of how to use it or boycot Musk.
  After all he clearly doesn't care for any rules or laws it is probably a very high risk sending anything to grok.
  
  Reply View | 0 replies
- ttoinou 5 months ago
  
  Interesting, thank you !
  
  Reply View | 0 replies

aubanel 5 months ago

> Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

Do you have any data to support 1. That grok is not more intelligent than previous models (you gave one anecdotal datapoint), and 2. That it was trained on more data than other models like o1 and Claude-3.5 Sonnet?

All datapoints I have support the opposite: scaling actually increases intelligence of models. (agreed, calling this "intelligence" might be a stretch, but alternative definitions like "scope, maybe, or flexibility, or coverage, or something" seem to me like beating around the bush to avoid saying that machines have intelligence)

Check out the technical report of Llama 3 for instance, with nice figures on how scaling up model training does increase performance on intelligence tests (might as well call that intelligence): https://arxiv.org/abs/2407.21783

Reply View 0 replies

melodyogonna 5 months ago

How can it be specifically trained on benchmarks when it is leading on blind chatbot tests?

The post you quoted is not a Grok problem if other LLMs are also failing, it seems, to me, to be a fundamental failure in the current approach to AI model development.

Reply View 4 replies

bearjaws 5 months ago

Any LLM that is uncensored does well on Chatbot tests because a refusal is an automatic loss.
And since 30% of people using Chatbots are Gooning it up theres far more refusals...

Reply View | 2 replies
- pyinstallwoes 5 months ago
  
  Gooning?
  
  Reply View | 1 reply
  
  bearjaws 5 months ago
  
  https://www.urbandictionary.com/define.php?term=gooning
  
  Reply View | 0 replies
nycdatasci 5 months ago

I think a more plausible path to gaming benchmarks would be to use watermarks in text output to identify your model, then unleash bots to consistently rank your model over opponents.

Reply View | 0 replies

aucisson_masque 5 months ago

Last time I used chatbox arena, I was the one to ask question to LLM and so I made my own benchmark. There wasn't any predefined question.

How could Musk LLM train on data that does not yet exist ?

Reply View 2 replies

HenryBemis 5 months ago

That. I have used only ChatGPT and I remember asking 4 legacy to write some code. I asked o3 the same question when it came out, and then I compared the codes. o3 was 'better' more precise, more detailed, less 'crude'. Now, don't get me wrong, crude worked fine. But when I wanted to do the v1.1 and v1.2 o3 nailed it every time, while 4 legacy was simply bad and full of errors.
With that said, I assume that every 'next' version of each engine is using my 'prompts' to train, so each new version has the benefit of having already processed my initial v1.0 and then v1.1 and then v1.2. So it is somewhat 'unfair' because for "ChatGTP v2024" my v1.0 is brand new while for "ChatGTP v2027" my v1.0, v1.1, v1.2 is already in the training dataset.
I haven't used Grok yet, perhaps it's time to pause that OpenAI payment and give Elon some $$$ and see how it works 'for me'.

Reply View | 0 replies
JKCalhoun 5 months ago

That's true. You can head over to lmarena.ai and pit it against other LLMs yourself. I only tried two prompts but was surprised at how well it did.
There are "leaderboards" there that provide more anecdotal data points than my two.

Reply View | 0 replies

jiggawatts 5 months ago

People have called LLMs a "blurry picture of the Internet". Improving the focus won't change the subject of the picture, it just makes it sharper. Every photographer knows this!

A fundamentally new approach is needed, such as training AIs in phases, where instead of merely training them to learn to parrot their inputs, the first AI is used to critique and analyse the inputs, which is then used to train another model in a second pass, which is used to critique the data again, and so on, probably for half a dozen or more iterations. On each round, the model can learn not just what it heard, but also an analysis of the veracity, validity, and consistency.

Notably, something akin to this was done for training Deepseek, but only in a limited fashion.

Reply View 0 replies

BiteCode_dev 5 months ago

It is very up to date however, I asked it about recent stuff on python packaging, and it gets it while others don't.

Reply View 0 replies