Comment by pickleballcourt

Comment by pickleballcourt 5 days ago

3 replies

One thing I've learnt from movie production is actually what separates professional from amateur quality is in the audio itself. Have you thought about implementing personaplex from NVDIA or other voice models that can both talk and listen at the same time?

Currently the conversation still feels too STT-LLM-TTS that I think a lot of the voice agents suffer from (Seems like only Sesame and NVDIA so far have nailed the natural conversation flow). Still, crazy good work train your own diffusion models, I remember taking a look at the latest literature on diffusion and was mind blown by the advances in last years or so since u-net architecture days.

EDIT: I see that the primary focus is on video generation not audio.

lcolucci 5 days ago

This is a good point on audio. Our main priority so far has been reducing latency. In service of that, we were deep in the process of integrating Hume's two-way S2S voice model instead of ElevenLabs. But then we realized that ElevenLabs had made their STT-LLM-TTS pipeline way faster in the past month and left it at that. See our measurements here (they're super interesting): https://docs.google.com/presentation/d/18kq2JKAsSahJ6yn5IJ9g...

But, to your point, there are many benefits of two-way S2S voice beyond just speed.

Using our LiveKit integration you can use LemonSlice with any voice provider you like. The current S2S providers LiveKit offers include OpenAI, Gemini, and Grok and I'm sure they'll add Personaplex soon.

  • echelon 4 days ago

    I'm a filmmaker. While what OP said is 100% true, your instincts are right.

    Not only is perfect is the enemy of good enough, you're only looking for PMF signal at this point. If you chase quality right now, you'll miss validation and growth.

    The early "Will Smith eating spaghetti" companies didn't need perfect visuals. They needed excited early adopter customers. Now look where we're at.

    In the fullness of time, all of these are just engineering problems and they'll all be sorted out. Focus on your customer.