Comment by 10xDev

Comment by 10xDev 7 hours ago

If AI can program, why does it matter if it can play Chess using CoT when it can program a Chess Engine instead? This applies to other domains as well.

RivieraKid 5 hours ago

It can write a chess engine because it has read the code of a thousand of chess engines. This benchmark measures a different aspect of intelligence.

And as a poker player, I can say that this game is much more challenging for computers than chess, writing a program that can play poker really well and efficiently is an unsolved problem.

Reply View 1 reply

10xDev 4 hours ago

The program doesn't need to be a solver. It can be anything that helps it.
It doesn't even need to be one tool but a series of tools.

Reply View | 0 replies

NitpickLawyer 5 hours ago

> If AI can program, why does it matter if it can play Chess using CoT when it can program a Chess Engine instead?

Heh, we really did come full circle on this! When chatgpt launched in dec22 one of the first things that people noticed is that it sucked at math. Like basic math 12 + 35 would trip it up. Then people "discovered" tool use, and added a calculator. And everyone was like "well, that's cheating, of course it can use a calculator, but look it can't do the simple addition logic"... And now here we are :)

Reply View 1 reply

paxys 5 hours ago

IMO there's an expectation for baseline intelligence. I don't expect an "AGI" model to beat Magnus Carlsen out of the box but it should be able to do basic grade school level arithmetic and play chess at a complete beginner level without resorting to external tools.

Reply View | 0 replies

10xDev 4 hours ago

I'm not going to respond to everything but the key to my comment was "This applies to other domains as well." But people are limiting their imagination to the chess engine example given for chess. The tool or program (or even other neural networks that are available) can be literally anything for any task... Use your imagination.

Maybe we should just get rid of tedious benchmarks like chess altogether at this point that is leading people to think of how to limit AI as a way of keeping it a relevant benchmark rather than expanding on what is already there.

Reply View 0 replies

Davidzheng 7 hours ago

They should be allowed to! In fact i think better benchmark would be to invent new games and test the models ability to allocate compute to minmax/alphazero new games in compute constraints

Reply View 0 replies

simianwords 6 hours ago

Its the same reason we are asked to write exams without using calculators but the real world does have them.

How you work without calculators is a proxy for real world competency.

Reply View 6 replies

10xDev 6 hours ago

Funny, you used probably the most useless form of benchmarking used on people as an example of measuring "competency" in the real world.

Reply View | 5 replies
- doctorpangloss 6 hours ago
  
  A lot of the insights of math come from knowing how to do things efficiently. That’s why the tests are timed. I don’t know, this is pretty basic pedagogy that you are choosing to grief.
  
  Reply View | 0 replies
- simianwords 6 hours ago
  
  are you in favour of children using calculators in exams?
  
  Reply View | 3 replies
  
  10xDev 6 hours ago
  
  It is a program. I need it to get task X done and I don't care how, whether it is strictly through CoT or with tools. There is no such thing as cheating in real work and no reason to handicap it. Just test the limits of what it can do with whatever means possible.
  Trying to solve everything with CoT alone without utilising tools seems futile.
  
  Reply View | 2 replies

CooCooCaCha 5 hours ago

CoT is upstream of building a chess engine.

Chess engines don’t grow on trees, they’re built by intelligent systems that can think, namely human brains.

Supposedly we want to build machines that can also think, not just regurgitate things created by human brains. That’s why testing CoT is important.

It’s not actually about chess, it’s about thinking and intelligence.

Reply View 0 replies