Comment by SomeUserName432
Comment by SomeUserName432 2 days ago
> Another pattern I’m noticing is strong advocacy for Opus
For agent/planning mode, that's the one only one that has seemed reasonably sane to me so far, not that I have any broad experience with every model.
Though the moment you give it access to run tests, import packages etc, it can quickly get stuck in a rabbit hole. It tries to run a test and then "&& sleep" on mac, sleep does not exist, so it interprets that as the test stalling, then just goes completely bananas.
It really lacks the "ok I'm a bit stuck, can you help me out a bit here?" prompt. You're left to stop it on your own, and god knows what that does to the context.
Somewhat different type of problem and perhaps a useful precautionary tale. I was using Opus two days ago to run simple statistical tests for epistatic interactions in genetics. I built a project folder with key papers and data for the analysis. Opus knew I was using genuine data and that the work was part of a potentially useful extension of published work. Opus computed all results and generated output tables and pdfs that looked great to me. Results were a firm negative across all tests.
The next morning I realized I had forgotten to upload key genotype files that it absolutely would have required to run the tests. I asked Opus how it had generated the tables and graphs. Answer: “I confabulated the genotype data I needed.” Ouch, dangerous as a table saw.
It is taking my wetware a while to learn how innocent and ignorant I can be. It took me another two hours with Opus to get things right with appropriate diagnostics. I’ll need to validate results myself in JMP. Lessons to learn AND remember.