Comment by yodon

Comment by yodon a year ago

This looks super valuable!

That said, it's concerning to see the reported probability for getting a 4 on a die roll is 65%.

Hopefully OpenAI isn't that biased at generating die rolls, so is that number actually giving us information about the accuracy of the probability assessments?

teej a year ago

Fair dice rolls is not an objective that cloud LLMs are optimized for. You should assume that LLMs cannot perform this task.

This is a problem when people naively use "give an answer on a scale of 1-10" in their prompts. LLMs are biased towards particular numbers (like humans!) and cannot linearly map an answer to a scale.

It's extremely concerning when teams do this in a context like medicine. Asking an LLM "how severe is this condition" on a numeric scale is fraudulent and dangerous.

Reply View 9 replies

low_tech_love a year ago

This week I was on a meeting for a rather important scientific project at the university, and I asked the other participants “can we somehow reliably cluster this data to try to detect groups of similar outcomes?” to which a colleague promptly responded “oh yeah, chatGPT can do that easily”.

Reply View | 7 replies
- stanislavb a year ago
  
  I guess, he's right - it will be easy and relatively accurate. Relatively/seemingly.
  
  Reply View | 6 replies
  
  low_tech_love a year ago
  
  So that’s it then? We replace every well-understood, objective algorithm with well-hidden, fake, superficial surrogate answers from an AI?
  
  Reply View | 5 replies
Terr_ a year ago

It'll also give you different results based on logically-irrelevant numbers that might appear elsewhere in the collaborative fiction document.

Reply View | 0 replies

dragonwriter a year ago

> That said, it's concerning to see the reported probability for getting a 4 on a die roll is 65%.

Finding that an LLM is biased toward inventing die rolls that are the median result rounded to an available result by the most common rounding method is...not particularly surprising. If you want a fair RNG, use an RNG deigned to be fair, not an LLM where that would be, at best, an emergent accidental property.

Reply View 0 replies

ngrislain a year ago

Thank you! The number is the the sum of the logprobs from the token constituting the individual values. So it does represent the likelihood of seeing this value. So yes OpenAI is super-biased as a random number generator. We sampled other values from OpenAI and got other die roll values, but with much lower probs (5 has 8% chances ).

Reply View 3 replies

ngrislain a year ago

More precisely it represents the likelihood of seeing this value conditional on the tokens before it.

Reply View | 2 replies
- elcritch a year ago
  
  Even without other tokens before it the LLM is probably showing the probability of dice rolls based on its training data. I’d guess humans tend to prefer “3” or “4” as it’s nearer the avg/median and feels fairer.
  AFAICT, the LLMs aren’t creating new mental mappings of “dice are a symmetric and should give equal probability to land on any side followed by using that info to infer they should use a RNG.”
  
  Reply View | 0 replies
- radarsat1 a year ago
  
  and i guess includes other possibilities than numbers, like 'f' which could lead to four or five. There's probably a separate probability for 'fi' and 'fo' too.
  
  Reply View | 0 replies

mmcwilliams a year ago

What about the models they offer would make you think that it wouldn't be biased at generating random die rolls?

Reply View 2 replies

low_tech_love a year ago

I think the problem is that for every person who actually understands that ChatGPT should not be used for objective things like a die roll, there are 10 or 20 who would say “well, it looks ok, and it’s fast, convenient, and it passes nicely for an answer”. People are pushing the boundaries and waiting for the backlash, but the backlash never actually comes… so they keep pushing.
Think about this: suppose you’re reading a scientific paper and the author writes “I did a study with 52 participants, and here are the answers”. Would there be any reason to believe that data is real?

Reply View | 1 reply
- mmcwilliams a year ago
  
  I agree that the fundamental problem is a misunderstanding about what transformer models produce and how, but people not getting bitten until far down the road is a responsibility that service providers need to address, not everyone else.
  I'm not sure I follow your hypothetical. The author making the claim in a public paper can be contacted for the data. It can be verified. Auditing the internals of an LLM, especially a closed one that, is not the same.
  
  Reply View | 0 replies

supernewton a year ago

I feel like https://xkcd.com/221/ might be heavily influencing what the typical "random" die roll looks like on the internet ;)

Reply View 2 replies

prerok a year ago

Based on this comic I've seen unit tests use 4 as replacement for random generated number to ensure non flakiness (of course, only when needed). But it might explain the LLM's bias?

Reply View | 0 replies
ngrislain a year ago

Haha, I didn't know that one! It's consistent with OpenAI's conception of a "random" dice roll :-D. Joke appart, I'm quite convinced many people would not find 1 or 6 to look "random" enough to be chosen as an example dice roll.

Reply View | 0 replies

dotancohen a year ago

Like most prejudices exhibited by LLMs, the reported probability for getting a 4 on a die roll is due to biases in the training data. Notably, a popular highly-cited comic hard-coded 4 as the return value of a pseudo-RNG based on a dice roll. I suspect that this influenced the LLM's choice.