Comment by vunderba
I remember when LLMs started getting mass traction and the first thing everyone wanted to build was AG Talking Bear + ChatGPT.
https://en.wikipedia.org/wiki/AG_Bear
With regard to this project, using an ESP32 makes a lot of sense, I used an Espressif ESP32-S3 Box to build a smart speaker along with the Willow inference server and it worked very well. The ESP speech recognition framework helps with wake word / far field audio processing.
The willow team has iterated fast. I think ESP-IDF is more advanced and using Arduino makes it easier for people to jump on and tinker with Speech-to-Speech AI which is why i created this repo