Comment by andy99
> mRNABench
Just curious, in other areas of ML, I think it's widely acknowledged that benchmarks have pretty limited real world value, just end up getting saturated, and (my view) are all pretty correlated, regardless of their ostensible speciality and don't really tell you that much.
Do you think mRNABench is different, or where do you see the limitations? Do you imagine this or any benchmark will be useful for anything beyond comparing how different models do on the benchmark?
I watched an interview with one of the co-founders of Anthropic where his point is that although benchmarks saturate they're still an important signal for model development.
We think the situation is similar here - one the challenges is aligning the benchmark with the function of the models. Genomic benchmarks for gLMs and RNA foundation models have been very resistant to staturation.
I think in NLP the problem is that they are victims of their own success where the models can be overfit to particular benchmarks really fast.
In genomics we're a bit behind. A good paper on this is DartEval where they provide levels of complexity https://arxiv.org/abs/2412.05430
in RNA the models work much better than DNA prediction but it's key to have benchmarks to measure progress.