Comment by lossolo

The funniest thing about all this is that the biggest difference between LLMs from Anthropic, Google, OpenAI, Alibaba is not model architecture or training objectives, which are broadly similar but it's the dataset. What people don't realize is how much of that data comes from massive undisclosed scrapes + synthetic data + countless hours of expert feedback shaping the models. As methodologies converge, the performance gap between these systems is already narrowing and will continue to diminish over time.