Comment by mohsen1
Mafia Arena -- Benchmarking LLMs for EQ
The only problem I have is that it's so effing expensive to run those games that I can't have a good number of games to claim to be any sort of legit benchmark. BUT so far the games that I paid out of pocket and ran are looking good and I think there is merit to this.
also had lots of fun building on top of Cloud Flare and solving some distributed systems problems while building this.
if you can help me run more games (for science!!) let me know!