Comment by xenadu02
> In this example wouldn't Theory A be better, because all else equal it is less likely the product of overfitting and required more insight and effort to discover?
No, Theory A might simply be a dead end with no new insights to offer. And alas: the universe does not care about insights, efforts, or simplicity.
All else equal if Theory B is easier to teach - easier for more people to understand - it might have value for that reason. It might also be valuable to teach multiple ways to understand the same underlying phenomenon.
> In other words, Theory A used a different process that we know has a higher likelihood of novel discovery.
How would we measure "likelihood of novel discovery"?
Now to call myself out here: the best way to answer any of these questions is to probe both theories at their limits to find differences in predictions that we can test. It may be that we don't have the right equipment or haven't designed experiments sufficient to do that currently.
Remember that Einstein's GR was validated by its prediction and the Eddington experiment, though his initial 1911 prediction was wrong and he later refined it in 1915. The 1919 Eddington measurements validated the theory.
We should remember though: That only worked out because the 1912 attempt to make the observations (which would have invalidated Einstein) got rained out. Who knows how Einstein's career would have turned out if the 1912 observations had succeeded. Perhaps people would have said he simply over-fit his theory to fit observation.