Comment by 3D30497420
Comment by 3D30497420 a day ago
Maybe inspiration from how Home Assistant can do local speech-to-text and vice versa? https://www.home-assistant.io/voice_control/voice_remote_loc...
Pretty sure you'd need to host this on something more robust than an ESP32 though.
Yeah, I was looking at home assistant as well, but it doesnt feel real-time, likely due to it having the transcription stage separate from the inference.