Comment by emp17344
Any chance you’re just learning more about what the model is and is not useful for?
Any chance you’re just learning more about what the model is and is not useful for?
There are some days where it acts staggeringly bad, beyond baselines.
But it’s impossible to actually determine if it’s model variance, polluted context (if I scold it, is it now closer in latent space to a bad worker, and performs worse?), system prompt and tool changes, fine tunes and AB tests, variances in top P selection…
There’s too many variables and no hard evidence shared by Anthropic.
I dunno about everyone else but when I learn more about what a model is and is not useful for, my subjective experience improves, not degrades.