Comment by 6LLvveMx2koXfwn

Comment by 6LLvveMx2koXfwn 21 hours ago

Received 01 April 2024

Accepted 26 March 2025

Published 20 May 2025

Probably normal but shows the built in obsolescence of the peer review journal article model in such a fast moving field.

dawnofdusk 2 hours ago

Fast-moving field? This is a chemistry paper not an ML paper. ML people have their conferences which are on much abridged timeframes.

Reply View 0 replies

eesmith 20 hours ago

How so?

To me it looks like the paper was submitted last year but the peer reviewers identified issues with the paper which required revision before the final acceptance in March.

We can see the paper was updated since the 1 April 2024 version as it includes o1-preview (released September 2024, I believe), and GPT‑3.5 Turbo from August. I think a couple of other tested versions also post-date 1 April.

Thus, one possible criticism might have been (and I stress that I am making this up) that the original paper evaluated only 3 systems, and didn't reflect the fully diversity of available tools.

In any case, the main point of the paper was not the specific results of AI models available by the end of last year, but the development of a benchmark which can be used to evaluated models in general.

How has that work been made obsolete?

Reply View 2 replies

bufferoverflow 19 hours ago

How so? All the models they've tested are obsolete, multiple generations behind high-end versions.
(Though even these obsolete models did better than the best humans and domain experts).

Reply View | 1 reply
- eesmith 19 hours ago
  
  As I wrote, the main point of the paper was not the specific model evaluation, but the development of a benchmark which can be used to test new models.
  Good benchmark development is hard work. The paper goes into the details of how it was carried out.
  Now that the benchmark is available, you or anyone else could use it to evaluate the current high-end versions, and measure how the performance has changed over time.
  You could also use their paper to help understand how to develop a new benchmark, perhaps to overcome some limitations in the benchmark.
  That benchmark and the contents of that paper are not obsolete until there is a better benchmark and description of how to build benchmarks.
  
  Reply View | 0 replies

rotis 17 hours ago

Yes, this paper and many others will be forgotten as soon as they leave the front page. Afterwards noone refers to articles like these here. People just talk about anecdotes and personal experiences. Not that I think this is bad.

Reply View 0 replies

Jimmc414 21 hours ago

shows the value of preprint servers like arxiv.org and chemrxiv.org

Reply View 0 replies