Comment by armcat
This is so awesome, well done LemonSlice team! Super interesting on the ASR->LLM->TTS pipeline, and I agree, you can make it super fast (I did something myself as a 2-hour hobby project: https://github.com/acatovic/ova). I've been following full-duplex models as well and so far couldn't get even PersonaPlex to run properly (without choppiness/latency), but have you peeps tried Sesame, e.g. https://app.sesame.com/?
I played around with your avatars and one thing that it lacks is that it's "not patient", it's rushing the user, so maybe something to try and finetune there? Great work overall!
Thank you! Impressive demo with OVA. Still feels very snappy, even fully local. It will be interesting to see how video plays out in that regard. I think we're still at least a year away from the models being good enough and small enough that they can run on consumer hardware. We compared 6 of the major voice providers on TTFB, but didn't try Sesame -- we'll need to give that one a look. https://docs.google.com/presentation/d/18kq2JKAsSahJ6yn5IJ9g...