Comment by lossolo

You’re still mixing up two different things.

Sample space, how many distinct labels sit on the die/in the jar (100) Event space, did the guess match the ground-truth label? ("correct" vs. "incorrect").

Knowing there are 99 wrong labels tells us how many distinct ways we can be wrong, NOT how likely we are to be wrong. Probability lives in the weights you place on each label, not in the label count itself. The moment you say "uniformly at random" you’ve chosen a particular weighting (each label gets 1⁄100). But nothing in the original claim required that assumption.

Imagine a classifier that, on any query, behaves like this:

emits the single correct status 50 % of the time.

sprays its remaining 50 % probability mass uniformly over the 99 wrong statuses (≈ 0.505% each).

There are still 99 ways to miss, but they jointly receive 0.50 of the probability mass, while the “hit” receives 0.50. When you grade the output, the experiment collapses to:

Outcome Probability

correct 0.50

wrong 0.50

Mathematically and for every metric that only cares about right vs. wrong (accuracy, recall etc.) this is a coin-flip.

Your jar contains 99 black marbles and 1 red marble and you assume each marble is equally likely to be drawn. Under that specific weight assignment

P(red)=0.01, yes, accuracy is 1 %. But that’s a special case (uniform weights), not a law of nature. Give the red marble extra weight, make it larger, magnetic, whatever, until P(red)=0.50 and suddenly the exact same jar of 100 physical objects yields a 50% success chance.

Once the system emits one label, the grader only records "match" or "mismatch". Every multiclass classification benchmark in machine learning does exactly that. So:

99 wrong labels -> many ways to fail

50% probability mass on "right" -> coin-flip odds of success

Nothing about the count of wrong options can force the probability of success down to 1 %. Only your choice of weights can do that.

"Fifty-fifty" refers to how much probability you allocate to the correct label, not to how many other labels exist. If the correct label soaks up 0.50 of the total probability mass, whether the rest is spread across 1, 9, or 99 alternatives, the task is indistinguishable from a coin flip in terms of success odds.

EDIT: If you still don't understand, just let me know and I will show you the math proof, that will confirm what I said.