Comment by danielodievich

Comment by danielodievich 12 hours ago

Last week there we had a customer request that landed in our support on a feature that I partially wrote and wrote a pile of public documentation on. Support engineer ran customer query through Claude (trained on our public and internal docs) and it very, very confidently made a bunch of stuff up in the response. It was quite plausible sounding and it would have been great if it worked that way, but it didn't. While explaining why it was wrong in a Slack thread with support engineer and another engineer who also worked on that feature, he ran Augment (that has full source code of the feature) which promptly and also very confidently made up more stuff (but different!). Some choice bleeding eye emojis were exchanged. I'm going to continue to use my own intelligence, thank you.

kristianp 10 hours ago

How is that comment relevant to this story about OpenAI's response to perceptions that Google has gained in market share?

Reply View 4 replies

varenc 10 hours ago

Popular HN threads about anything AI related always attract stories highlighting AI failures. It's such a common pattern I want to analyze it and get numbers. (which might require AI...)

Reply View | 1 reply
- Yeask 9 hours ago
  
  Popular HN threads about anything AI related always attract stories highlighting AI success. It's such a common pattern I want to analyze it and get numbers. (which might require to use my brain...)
  
  Reply View | 0 replies
computerthings 10 hours ago

[dead]

Reply View | 0 replies
raincole 10 hours ago

Welcome to Hacker News. You're allowed to post anti-AI, anti-Google or anti-Musk content in any thread. /s

Reply View | 0 replies

oersted 2 hours ago

Relying on the model’s own “memory” to answer factual queries is almost always a mistake. Fine-tuning is almost always a more complex, more expensive and less effective method to give a model access to a knowledge base.

However using the model as a multi-hop search robot, leveraging it’s general background knowledge to guide the research flow and interpret findings, works exceedingly well.

Training with RL to optimize research tool use and reasoning is the way forward, at least until we have proper Stateful LLMs that can effectively manage an internal memory (as in Neural Turing Machines, and such).

Reply View 0 replies

ramraj07 3 hours ago

"trained on our public and internal docs" trained how? Did you mean fine-tuned haiku? Did you actually fine tune correctly? Its not even a recommended architecture.

Or did you just misuse basic terminology about LLMs and are now saying it misbehaved, likely because your org did something very bad with?

Reply View 0 replies

pshirshov 10 hours ago

All depends on the tasks and the prompting engineers.

Even with your intelligence you would need years to deliver something like this: https://github.com/7mind/jopa

The outcome will be better for sure, but you won't do anything like that in a couple of weeks. Even if you have a team of 10. Or 50.

And I'm not an LLM proponent. Just being an empirical realist.

Reply View 0 replies

tomp 12 hours ago

I don't know man.

My code runs in 0.11s

Gemini's code runs in 0.5s.

Boss wants an explanation. ¯\_(ツ)_/¯

Reply View 4 replies

loloquwowndueo 11 hours ago

As long as the explanation is going to come out being wrong, I’m sure you can whip something up in 0 seconds.

Reply View | 0 replies
brazukadev 10 hours ago

0.11s is faster than 0.5s

Reply View | 2 replies
- tomp 5 hours ago
  
  Yeah that’s the point. Now instead of just writing good code, I’m also supposed to debug shitty AI code.
  
  Reply View | 0 replies
- gmzamz 10 hours ago
  
  Boss is using ai. 11 is clearly bigger than 5
  
  Reply View | 0 replies

scotty79 12 hours ago

Yeah, LLMs are not really good about things that can't be done.

At some point you'll be better off with implementing features they hallucinated. Some people with public APIs already took this approach.

Reply View 7 replies

AdieuToLogic 11 hours ago
>> Support engineer ran customer query through Claude (trained on our public and internal docs) and it very, very confidently made a bunch of stuff up in the response.
> Yeah, LLMs are not really good about things that can't be done.
From the GP's description, this situation was not a case of "things that can't be done", but instead was the result of a statistically generated document having what should be the expected result:
It was quite plausible sounding and it would have been great if it worked that way, but it didn't.
Reply View | 1 reply
- verdverm 3 hours ago
  
  The core issue is likely not with the LLM itself. Given sufficient context, instructions, and purposeful agents, a DAG of these will not produce such consistently wrong results with good grounding context
  There are a lot of devils in the details and there are few in the story
  
  Reply View | 0 replies
empressplay 12 hours ago

This is the way. (Sadly)

Reply View | 0 replies
131hn 12 hours ago

They are trained with 100% true facts and sucessfull paths.
We humans grec our analysis/reasoning skills towards the 99.9999% failed attempts of everything we did, uncessfull trials and errors, wastefull times and frustrations.
So we know that behind a truth, there’s a bigger world of fantasy.
For LLM, everything is just a fantasy. Everything is as much true as it’s opposite. It will need a lot more than the truth to build intelligence, it will require controled malice and deceptions

Reply View | 3 replies
- antinomicus 11 hours ago
  
  I was with you until the very last line, can you expand on that?
  
  Reply View | 2 replies
  
  abakker 11 hours ago
  
  I think he was getting at the fact that the Truth is not good news to everyone.
  
  Reply View | 0 replies
  
  cindyllm 11 hours ago
  
  [dead]
  
  Reply View | 0 replies