Comment by ignoramous
Comment by ignoramous a day ago
Moshi is CC-BY. Another similar 7b (speech-text real-time conversational) model that was recently released under Apache v2: https://tincans.ai/slm3 / https://huggingface.co/collections/tincans-ai/gazelle-v02-65...
Important distinction is that tincans is not speech to speech. It uses a separate turn/pause detection model and a text to speech final processing step.