Comment by godelski
Comment by godelski 2 days ago
> Someone has probably studied this
There's even a name for itComment by godelski 2 days ago
> Someone has probably studied this
There's even a name for itI view Goodhart's law more as a lesson for why we can never achieve a goal by offering specific incentives if we are measuring success by the outcome of the incentives and not by the achievement of the goal.
This is of course inevitable if the goal cannot be directly measured but is composed of many constantly moving variables such as education or public health.
This doesn't mean we shouldn't bother having such goals, it just means we have to be diligent at pivoting the incentives when it becomes evident that secondary effects are being produced at the expense of the desired effect.
> This is of course inevitable if the goal cannot be directly measured
It's worth noting that no goal can be directly measured[0].I agree with you, this doesn't mean we shouldn't bother with goals. They are fantastic tools. But they are guides. The better aligned our proxy measurement is with the intended measurement then the less we have to interpret our results. We have to think less, spending less energy. But even poorly defined goals can be helpful, as they get refined as we progress in them. We've all done this since we were kids and we do this to this day. All long term goals are updated as we progress in them. It's not like we just state a goal and then hop on the railroad to success.
It's like writing tests for code. Tests don't prove that your code is bug free (can't write a test for a bug you don't know about: unknown unknown). But tests are still helpful because they help evidence the code is bug free and constrain the domain in which bugs can live. It's also why TDD is naive, because tests aren't proof and you have to continue to think beyond the tests.
If I hadn't seen it in action countless times, I would belive you. Changelists, line counts, documents made, collaborator counts, teams lead, reference counts in peer reviewed journals...the list goes on.
You are welcome to prove me wrong though. You might even restore some faith in humanity, too!
The Zoological Survey of India would like to know but hasn't figured out a good way to do a full census. If you have any ideas they would love to hear them.
Naja naja has Least Concern conservation status, so there isn't much funding in doing a full count, but there are concerns as encroachment both reduces their livable habitat and puts them into more frequent contact with humans and livestock.
The comment was a joke.
Could you elaborate or link something here? I think about this pretty frequently, so would love to read something!
Metric: time to run 100m
Context: track athlete
Does it cease to be a good metric? No. After this you can likely come up with many examples of target metrics which never turn bad.
If it were a good metric there wouldn't be a few phone books worth of regulations on what you can do before and during running 100 meters. From banning rocket shoes, to steroids, to robot legs the 100 meter run is a perfect example of a terrible metric both intrinsically as a measure of running speed and extrinsically as a measure of fitness.
> Metric: time to run 100m
> Context: track athlete
> Does it cease to be a good metric? No.
What do you mean? People start doping or showing up with creatively designed shoes and you need to layer on a complicated system to decide if that's cheating, but some of the methods are harder to detect and then some people cheat anyway, or you ban steroids or stimulants but allow them if they're by prescription to treat an unrelated medical condition and then people start getting prescriptions under false pretexts in order to get better times. Or worse, someone notices that the competition can't set a good time with a broken leg.
So what is your argument, that it doesn't apply everywhere therefore it applies nowhere?
You're misunderstanding the root cause. Your example works as the the metric is well aligned. I'm sure you can also think of many examples where the metric is not well aligned and maximizing it becomes harmful. How do you think we ended up with clickbait titles? Why was everyone so focused on clicks? Let's think about engagement metrics. Is that what we really want to measure? Do we have no preference over users being happy vs users being angry or sad? Or are those things much harder to measure, if not impossible to, and thus we focus on our proxies instead? So what happens when someone doesn't realize it is a proxy and becomes hyper fixated on it? What happens if someone does realize it is a proxy but is rewarded via the metric so they don't really care?
Your example works in the simple case, but a lot of things look trivial when you only approach them from a first order approximation. You left out all the hard stuff. It's kinda like...
Edit: Looks like some people are bringing up metric limits that I couldn't come up with. Thanks!
Do you have an example that doesn't involve an objective metric? Of course objective metrics won't turn bad. They're more measurements than metrics, really.
Thanks for sharing. I did not know this law existed and had a name. I know nothing about nothing but it appears to be the case that the interpretation of metrics for policies assume implicitly the "shape" of the domain. E.g. in RL for games we see a bunch of outlier behavior for policies just gaming the signal.
There seems to be 2 types
- Specification failure: signal is bad-ish, a completely broken behavior --> local optimal points achieved for policies that phenomenologically do not represent what was expected/desired to cover --> signaling an improvable reward signal definition
- Domain constraint failure: signal is still good and optimization is "legitimate", but you are prompted with the question "do I need to constraint my domain of solutions?"