Show HN: I open-sourced my AI toy company that runs on ESP32 and OpenAI realtime

177 points by akadeb 3 months ago

Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code.

This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly.

I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic.

This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency.

Sean-Der 3 months ago

This is wonderful, really great job on this! For me physical devices is when it really starts to feel magical. My pre-schooler never engaged with Speech-to-Speech examples I showed her on a screen. However, when I showed her a reindeer toy[1] on my desk that tells joke that is when it became real. It is the same joy/wonder I felt playing Myst for the first time.

----

If anyone is trying to build physical devices with Realtime API I would love to help. I work at OpenAI on Realtime API and worked on [0] (was upstreamed) and I really believe in this space. I want to see this all built with Open/Interoperable standards so we don't have vendor lock-in and developers can build the best thing possible :)

[0] https://github.com/openai/openai-realtime-embedded

[1] https://youtu.be/14leJ1fg4Pw?t=804

Reply View 2 replies

StefMyb 3 months ago

I would love to chat further with you about this. I am working on building a educational conversational toy. The toy will tell stories and sing but the conversational aspect is the only thing at this stage that requires AI. The whole idea came from my daughter who was in Kinder at the time

Reply View | 1 reply
- Sean-Der 3 months ago
  
  sean @ pion.ly please email me any time.
  Offer is open for anyone. If you need help with WebRTC/Realtime API/Embedded I am here to help. I have an open meeting link on my website.
  
  Reply View | 0 replies

drakenot 3 months ago

Something that really kills the 'effect' of most of the Voice > AI demos that I see is the cold start / latency.

The OpenAI "Voice Mode" is closer, but when we can have near instantaneous and natural back and forth voice mode, that will be a big in terms of it feeling magical. Today, it is say something, awkwardly wait N seconds then listen to the reply and sometimes awkwardly interrupt it.

Even if the models were no smarter than they are today, if we could crack that "conversational" piece and performance piece, it would be a big difference in my opinion.

Reply View 7 replies

akadeb 3 months ago

Yeah the way I am handling this is turn detection which feels unnatural. I like how Livekit handles turn detection with a small model[0][1] [0]https://www.youtube.com/watch?v=EYDrSSEP0h0 [1]https://docs.livekit.io/agents/build/turns/turn-detector/
``` turn_detection: { type: "server_vad", threshold: 0.4, prefix_padding_ms: 400, silence_duration_ms: 1000, }, ```

Reply View | 0 replies
Sean-Der 3 months ago

I think it will always feel unnatural as long as 'AI Speech' is turn based. Right now developers used Voice Activity Detection to detect when the user has stopped talking.
What would be REALLY cool is if we had something that would interrupt you during conversation like talking with a real human.

Reply View | 2 replies
- conductr 3 months ago
  
  I can see how interruptions would prove even more unnatural and annoying pretty quick. There's a lot of nuance in knowing how to interrupt properly and often, people that interrupt only do so quickly, then yield, allow person to finish then resume - very situational and tons of nuance. Otherwise, with current level of sophistication, you'd just have the AI talking over you the entire time, not allowing you to complete your thoughts/questions/commands/etc and people would quickly be more frustrated and just turn it off.
  
  Reply View | 1 reply
  
  mst 3 months ago
  
  I absolutely agree with your analysis wrt current tech - however, I suspect the person you're replying to is talking about "what would be really cool" in terms of it happening in a future where the relevant underpinnings had advanced to the point where it could actually manage the situational/nuance stuff properly.
  I almost certainly wouldn't want to use something that tried to implement it now but it's a lovely dream and the state of the art keeps advancing at quite the speed (i.e. faster than I would have predicted, even when I do my best to take into account that it keeps advancing faster than I would have predicted ;).
  
  Reply View | 0 replies
dgellow 3 months ago

Have you recently tried OpenAI voice mode from ChatGPT Plus? It's basically what you describe

Reply View | 2 replies
- drakenot 3 months ago
  
  Yes, I mentioned this in the comment.
  I think it is closer, although still even it has a cold start problem. Once you are connected and in-session, it is a better experience.
  There is still some "turn based" conversational aspect to it that can be awkward but it is much better. It also helps that you can "tap and hold" to override, which is a bit of a hack but works well in practice for that mobile use-case.
  
  Reply View | 1 reply
  
  dgellow 3 months ago
  
  Sorry what I meant was: have you tried _recently_? I feel that what you describe is the old implementation. The current version, with a colorful bubble, has very low latency and starts almost instantly. The older version has a UI composed of a white dot on a black background, and is pretty slow. But the more recent one is pure magic IMHO.
  
  Reply View | 0 replies

hoppp 3 months ago

Its great.lovely. but on the long run these toys rely on subscription payment?

Both the supabase Api and OpenAI billing is per api call.

So the lovely talking toys can die if the company stops being profitable.

I would love to see a version with decent hardware that runs a local model, that could have a long lifespan and work offline.

Reply View 3 replies

xp84 3 months ago

> lovely talking toys can die if the company stops being profitable.
This is a good point to me as a parent -- in a world where this becomes a precious toy, it would be a serious risk of emotional pain if the child experienced this scenario like the death of a pet or friend.
> version with decent hardware that runs a local model
I feel like something small and efficient enough to meet that (today) would be dumb as a post. Like Siri-level dumb.
Personally, I'd prefer a toy which was tethered to a home device. Without a cloud (and thus commercial) dependency, the toy wouldn't be 'smart' outside of Wi-fi range, but I'd design it so that it got 'sleepy' when away from Wi-fi, able to be "woken up" and, in that state, to respond to a few phrases with canned, Siri-like answers. Perhaps new content could be made up for it daily and downloaded to local storage while at home, so that it could still "tell me a story" offline etc.

Reply View | 1 reply
- scottmcf 3 months ago
  
  > This is a good point to me as a parent -- in a world where this becomes a precious toy, it would be a serious risk of emotional pain if the child experienced this scenario like the death of a pet or friend.
  We've already seen this exact scenario play out with "Moxie" a few months ago:
  https://www.axios.com/2024/12/10/moxie-kids-robot-shuts-down
  
  Reply View | 0 replies
zild3d 3 months ago

well for now its either small device that uses APIs or Paddington Bear needs a backpack for his GPU

Reply View | 0 replies

empath75 3 months ago

When someone figures this out, it's going to be a multi billion dollar company, but the safety concerns for actually putting something like this into the hands of children are unbelievable.

Reply View 12 replies

mithr 3 months ago

This. The idea is super cool in theory! But given how these sort of things work today, having a toy that can have an independent conversation with a kid and that, despite the best intentions of the prompt writer, isn't guaranteed to stay within its "sandbox", is terrifying enough to probably not be worth the risk.
IMO this is only exacerbated by how little children (who are the presumably the target audience for stuffed animals that talk) often don't follow "normal" patterns of conversation or topics, so it feels like it'd be hard to accurately simulate/test ways in which unexpected & undesirable responses could come out.

Reply View | 9 replies
- conductr 3 months ago
  
  I'm trying to use my imagination, but what exactly is the fear? Perhaps the AI will explain where baby's come from in graphic detail before the parent is ready to have that conversation or something similar? Or, for us in US, maybe it tells your kid they should wear a bullet proof vest to pre-K instead of bringing a stuffy for naptime?
  Essentially, telling kids the truth before they're ready and without typical parental censorship? Or is there some other fear, like the AI will get compromised by a pedo and he'll talk your kid into who knows what? Or similar for "fill in state actor" using mind control on your kid (which, honestly, I feel like is normalized even for adults; eg. Fox News, etc., again US-centric)
  
  Reply View | 8 replies
  
  mithr 3 months ago
  
  I'll respond to the content, because I think there are some genuine questions amongst the condescension and jumping to conclusions.
  > telling kids the truth before they're ready and without typical parental censorship
  Does AI today reliably respond with "the truth"? There are countless documented incidents of even full-grown, extremely well-educated adults (e.g. lawyers) believing well-phased hallucinations. Kids, and particularly small kids who haven't yet had much education about critical thinking and what to believe, have no chance. Conversational AI today isn't an uncensured search engine into a set of well-reasoned facts, it's an algorithm constructing a response based on what it's learned people on the internet want to hear, with no real concept of what's right or wrong, or a foundational set of knowledge about the world to contrast with and validate against.
  > what exactly is the fear
  Being fed reliable-sounding misinformation is one. Another is being used for emotional support (which kids do even with non-talking stuffed animals), when the AI has no real concept of how to emotionally support a kid and could just as easily do the opposite. I guess overall, the concern is having a kid spend a large amount of time talking to "someone" who sounds very convincing, has no real sense of morality or truth, and can potentially distort their world view in negative ways.
  And yea, there's also exposing kids to subjects they're in no way equipped to handle yet, or encouraging them to do something that would result in harm to themselves or to others. Kids are very suggestible, and it takes a long while for them to develop a real understanding of the consequences of their actions.
  
  Reply View | 1 reply
  
  conductr 3 months ago
  
  Bravo, this is an answer beyond the outright fearmongering that actually makes sense and I wasn't considering. I still struggle with how it's much different than social media in terms of shaping what kids believe and their perception of reality, but I do get what you're saying - that this could be next level dangerous in terms of them believing what it says without much critical thinking.
  
  Reply View | 0 replies
  
  3np 3 months ago
  
  How about encouraging self-harm, even murder and suicide?
  https://www.npr.org/2024/12/10/nx-s1-5222574/kids-character-...
  https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-a...
  https://www.euronews.com/next/2023/03/31/man-ends-his-life-a...
  
  Reply View | 3 replies
  
  xp84 3 months ago
  
  > Perhaps the AI will explain where baby's come from in graphic detail before the parent is ready to have that conversation or something similar?
  I mean, that's not a silly fear. But perhaps you don't have any children? "Typical parental censorship" doesn't mean prudish pearl-clutching.
  I have an autistic child who already struggles to be appropriate with things like personal space and boundaries -- giving him an early "birds and bees talk" could at minimum result in him doing and saying things that could cause severe trauma to his peers. And while he uses less self-control than a typical kid, even "completely normal" kids shouldn't be robbed of their innocence and forced to confront every adult subject until they're mature enough to handle it. There's a reason why content ratings exist.
  Explaining difficult subjects to children, such as the Holocaust, sexual assault, etc. is very difficult to do in a way that doesn't leave them scarred, fearful, or worse, end up warping their own moral development so that they identify with the bad actors.
  
  Reply View | 1 reply
  
  conductr 3 months ago
  
  I have a 6 year old. I don't let him use the internet or tablets or phones, so I get it, question was out of curiosity of other people's thought process. I just lack the imagination to know what other people are actually afraid of as I often find people have what I consider far fetched boogeyman imaginations. Yet, they allow their infants to play on an iPad for hours, etc. which I find no more/less risky especially as they become older and can seek out content they prefer. My ban on it for my kid is more so based on my parenting opinion that boredom is a life skill and beneficial to young minds (probably all ages actually) and constant entertainment/screentime is unhealthy. I don't ban the devices because I'm afraid of the content he may encounter, I just want him to enjoy his childhood before it's inevitably stolen by screens.
  I think my theory is kind of correct, people generally 'trust' a YouTube censor but an AI censor is currently seen as untrusted boogeyman territory.
  
  Reply View | 0 replies
georgemcbay 3 months ago

Reminds me of Conan O'Brien's old WikiBear skits
https://youtu.be/0SfSx9ts46A

Reply View | 0 replies
hoppp 3 months ago

Babies often have ipads now. I think they should make an offline toy with decent hardware inside. That would be somethin.

Reply View | 0 replies

justanotheratom 3 months ago

This is quite cool. Two questions:

- why do you need nextjs frontend for what looks like a headless use case? - how much would be the OpenAI bill if there is 15 minutes of usage per day?

Reply View 5 replies

irq-1 3 months ago

> This equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output.
https://openai.com/index/introducing-the-realtime-api/
About the nextjs site, I was thinking maybe its difficult to have supabase hold long connections, or route the response? I'm curious too.

Reply View | 1 reply
- akadeb 3 months ago
  
  The long connections are ultimately handled by Deno Edge so the site isn't used there. The NextJS frontend (which also could be an iOS/Android app) helps provide an interface to select character, create AI characters, set ESP32 volume, and view conversation history.
  
  Reply View | 0 replies
akadeb 3 months ago

thank you! The nextjs frontend is to set things like device volume, selecting which character you are interacting with, viewing conversation history etc. I just tried it and for a 15 minute chat, it's roughly 20c. Roughly 570 input tokens

Reply View | 0 replies
JKCalhoun 3 months ago

And I am wondering, why use an ESP32 if you don't need the WiFi? (And, please, no WiFi in a toy!)

Reply View | 1 reply
- akadeb 3 months ago
  
  Currently we connect to a Wifi network to reach the Deno edge server. Some popular toys doing it: Yoto, Toniebox
  
  Reply View | 0 replies

supermatt 3 months ago

This looks like so much fun! I have recently gotten into working with electronics, so it seems like a nice little project to undertake.

I noticed that it is dependent on openAIs realtime API, so it got me wondering what open alternatives there are as I would love a more realtime alexa-like device in my home that doesnt contact the cloud. I have only played with software, but the existing solutions have never felt realtime to me.

I could only find <https://github.com/fixie-ai/ultravox> that would seem to really work as realtime. It seems to be some model that wires up llama and whisper somehow, rather than treating them as separate steps which is common with other projects.

What other options are available for this kind of real-time behaviour?

Reply View 6 replies

Sean-Der 3 months ago

My plan is that Espressif’s WebRTC code[0] will hook up to pipe at [1] that gets you the freedom to do whatever you want.
The design of OpenAI + WebRTC was to lean on WebRTC as much as possible to make it easier for users.
[0] https://github.com/espressif/esp-webrtc-solution
[1] https://github.com/pipecat-ai/pipecat

Reply View | 2 replies
- akadeb 3 months ago
  
  Pipecat is awesome! is it similar to what livekit provides?
  I think Realtime API adoption would be higher if it is offered on Arduino rather than ESP-IDF as the latter is not very beginner friendly. That was one of the main reasons I built this repo using edge functions instead of a direct WebRTC connection.
  
  Reply View | 0 replies
- supermatt 3 months ago
  
  Fantastic! This will save a ton of work
  
  Reply View | 0 replies
_neil 3 months ago

Not on-device but for local network I’ve been looking at Speaches[0]. Haven’t tried it yet, but I have been running kokoru-web[1] and the quality and speed is really good.
[0] https://speaches.ai/ [1] https://huggingface.co/spaces/Xenova/kokoro-web

Reply View | 0 replies
3D30497420 3 months ago

Maybe inspiration from how Home Assistant can do local speech-to-text and vice versa? https://www.home-assistant.io/voice_control/voice_remote_loc...
Pretty sure you'd need to host this on something more robust than an ESP32 though.

Reply View | 1 reply
- supermatt 3 months ago
  
  Yeah, I was looking at home assistant as well, but it doesnt feel real-time, likely due to it having the transcription stage separate from the inference.
  
  Reply View | 0 replies

behnamoh 3 months ago

am I the only one who finds the unnecessarily positive vibes of OpenAI realtime voices unrealistic, too much, and borderline creepy?

Reply View 6 replies

mickael-kerjean 3 months ago

Yep and having it in a child toy is way beyond the border of creepy

Reply View | 2 replies
- akadeb 3 months ago
  
  Currently our device is a toy accessory. And for children we are strictly focusing on `Story mode`. Where adventure stories / fairy tales feel more engaging. I think there's value in getting the AI to create epic stories consistently
  
  Reply View | 0 replies
- 3np 3 months ago
  
  Moreso from the consent- and privacy angle.
  
  Reply View | 0 replies
mst 3 months ago

OpenAI stuff in general seems (to me, at least) to be overly positive and confident in terms of how it replies.
While I make no foolish claims that it's perfect, I've found Claude feels much less arrogant, and was genuinely appreciative when one of its replies started with an (accurate, of course I checked primary sources to verify that) analysis of the first half of my question, and then for the more obscure second half said "I'm not sure if I can answer that without hallucinating, but here's some stuff you could try researching."
Certainly Claude's tone and "attitude" (FSVO) works much better for me than any other LLM I've tried, though mileage will, of course, vary.
(I have zero connection to the company and am still on a free account, I'm just quietly impressed relative to the competition)

Reply View | 0 replies
scyzoryk_xyz 3 months ago

You’re not the only one, same here.
I believe there will be interest in extracting insights from speech-related fields, performing arts etc. Kind of how there was this transfer of design principles in the 90’s-00’s from traditional typographers, letterform revivals, print techniques.
It’ll be interesting to see an evolution of expectations and culture emerge around AI voices depending on role. Maybe we’ll see these positive voice vibes as silly and naive the same way we see MySpace aesthetics today?

Reply View | 0 replies
bethekidyouwant 3 months ago

[flagged]

Reply View | 0 replies

[removed] 3 months ago

[deleted]

Reply View 0 replies

andruby 3 months ago

Really nice! Thank you for including a youtube video. It's a little unfortunate that you do time cuts between your "prompt" and the response. I'm curious if you were waiting 0.5s or 10s to get the response. I think the usability/fun of this stands or falls with that latency.

Maybe it could be combined with fastvoiceagent.cerebrium.ai (discussed 10 months ago https://news.ycombinator.com/item?id=40805010) for lower latency

Reply View 1 reply

akadeb 3 months ago

Thanks for the feedback. I have attached the raw unedited video here: https://drive.google.com/file/d/1kEmbVInvUrYFwjddyGL8Rz03c0N... (sorry the video is a bit long ~5min with some intro about my company :-)

Reply View | 0 replies

vunderba 3 months ago

I remember when LLMs started getting mass traction and the first thing everyone wanted to build was AG Talking Bear + ChatGPT.

https://en.wikipedia.org/wiki/AG_Bear

With regard to this project, using an ESP32 makes a lot of sense, I used an Espressif ESP32-S3 Box to build a smart speaker along with the Willow inference server and it worked very well. The ESP speech recognition framework helps with wake word / far field audio processing.

Reply View 1 reply

akadeb 3 months ago

The willow team has iterated fast. I think ESP-IDF is more advanced and using Arduino makes it easier for people to jump on and tinker with Speech-to-Speech AI which is why i created this repo

Reply View | 0 replies

ianbicking 3 months ago

What's been your experience with the Realtime API? I've been doing LLM with voice, but haven't really given it a try – the price is so high, and it feels like it's much harder to control. Specifically that you just get one system prompt and then the model takes over entirely. (Though looking at the API, I see you can inject text and do some other things to play around with the session.)

Reply View 2 replies

akadeb 3 months ago

I agree, it's still pricy. The cost works out better with `gpt-4o-mini-realtime-preview-2024-12-17`.
Yep its constrained to the system prompt but I pass in conversation history with each new session to keep it relevant. It also supports tool calling which is clutch.
Have you tried Hume AI? They've got a neat suite of APIs that give you more control on each session.

Reply View | 1 reply
- ianbicking 3 months ago
  
  Hume has been on my radar for a long time, but I've never actually used their products. They keep coming out with new lines and yet I never see anyone talk about them... I'm not sure why? Though it's so hard to figure out their offerings, and some seem to actually be wrappers around other LLMs...
  Do you know what Hume's latency is like? The completely vertically integrated Realtime API is pretty compelling because of that latency, but it's not as clear to me how they would make that all work with their hybrid system.
  
  Reply View | 0 replies

dayvid 3 months ago

Really interesting. Also more powerful if integrated with animatronic movement. Reminds me of Furby. Doesn't even have to be full AI, just augmented with slightly smarter and more flexible capabilities

Reply View 1 reply

akadeb 3 months ago

thanks David, let me know if you get a chance to try it out!

Reply View | 0 replies

tantalor 3 months ago

I'm surprised by the overwhelming positive vibes in the comments here.

Maybe I'm alone? To me, this comes across as extremely creepy, the exact opposite of what we should desire from AI in products aimed at children.

Reply View 14 replies

adregan 3 months ago

Totally get the creepy part, but my criticism of devices like this is that they seem to be made by people with limited exposure to the creative power of children.
Children don’t need this; they are so much more creative than an AI (and the adults that trained the AI), and their creativity is fueled by boredom.

Reply View | 5 replies
- akadeb 3 months ago
  
  > with limited exposure to the creative power of children.
  This is true, I am not a parent. But I have some domain expertise in building a conversational toy... talking to many parents and having been a child myself for several years has helped
  
  Reply View | 0 replies
- ospider 3 months ago
  
  I doubt that. I have two kids, 4yo and 6yo. I told my kids that I can make their toys talk (using AI) a few months ago, and they have been constantly asking me when it will be ready.
  
  Reply View | 0 replies
- mannyv 3 months ago
  
  You'd be surprised how un-creative many children are.
  
  Reply View | 0 replies
- mst 3 months ago
  
  I feel like it would be creepy if the kid was using it without anybody ever checking up on it ... but I think all of my friends with kids would say that the answer to that is "parenting."
  I mean, giving a kid an unlocked iPad and not bothering to do basic supervision can also have really creepy results, so I'm unconvinced that something like your work actually makes anything worse in the negligent parenting situation, and seems like it could be a lot of fun in the competent parenting one.
  If you haven't already done this, I'd note that I can think of a number of parents who would probably rather enjoy a version of story mode that let them collaborate with their child and your code to put together a bedtime story before they turn it off for the night and tuck the kid into bed.
  
  Reply View | 0 replies
- dayvid 3 months ago
  
  I mean when I was a kid I had action figures and played out scenarios. Would be pretty nuts if you could make your own TV shows with AIs assisting the play. Or set up your own battles, etc. Especially if it had more animatronic entry points
  
  Reply View | 0 replies
supermatt 3 months ago

I commented that I like the project, in that it is a project that helps you to create a realtime assistant - i would love to replace alexa/siri/whatever with something actually useful.
That said, I totally agree that I wouldn't want this in a kids toy. The whole idea is super creepy in that respect, with so much scope for abuse.

Reply View | 0 replies
akadeb 3 months ago

For parents we added a `Story mode` option (similar to Yoto toy / Toniebox). The idea is: the AI crafts a story and invites the child to craft the story together in a more engaging way. The story prompt keeps the story focused and in scope.

Reply View | 0 replies
Sean-Der 3 months ago

I hope these toys could be a joy/comfort for kids that don’t have a parent that cares.
I poured hours into games/programming because it was a happy place away from school etc… These toys could be the same.
This technology is neutral, but I see so much potential for projects that do good.

Reply View | 0 replies
akadeb 3 months ago

The Elato toy is currently not aimed at children. The current version has adult characters that are entertaining and fun to engage with like the Chad Brew Barkley character in the videos. I put up more such funny videos on my tiktok tiktok.com/@elatoai
However, while testing it with a friend who has a 5-year old daughter, I added a `Story mode` feature to create dynamic stories for her which she enjoys.
I think what would be even cooler is if each character in a story has unique voices (like voice of an ogre, voice of an elf etc.) which is currently unsupported in the single websocket connnection.

Reply View | 0 replies
bethekidyouwant 3 months ago

Why is the idea of a child talking to a LLM creepy? Do you think a child is gonna figure out how to jailbreak the “keep it keep kid, friendly” prompt, and start talking about I don’t even know what … kids don’t know about adult things. That’s just not how kids be.

Reply View | 2 replies
- spencerflem 3 months ago
  
  I genuinely can't fathom how it wouldn't be creepy.
  Bots are for doing tasks. I don't want to socialize with them and find the idea of kids being socialized by bots supremely weird. At least the AI girlfriend people are (probably unwell) adults.
  
  Reply View | 0 replies
- handoflixue 3 months ago
  
  > Why is the idea of a child talking to a LLM creepy?
  The target audience is young kids who are still developing socialization skills. This toy off-boards that development from a human to an AI. We don't really know how that affects a kid.
  This also plausibly trains the kid to think of other people as AIs: subservient tools that exist primarily to respond to them. Not exactly a healthy attitude to take towards one's peers.
  It's presumably also going to get a lot of unsupervised usage, and the occasional AI model updates. What happens when a bad model update has it advising kids that soap is a forbidden candy that tastes delicious?
  (I'm not saying any of these is particularly likely, just trying to share the sort of concerns that would lead someone to feeling creeped out)
  
  Reply View | 0 replies
behnamoh 3 months ago

Exactly my thoughts when I first saw the comments!

Reply View | 0 replies

[removed] 3 months ago

[deleted]

Reply View 0 replies

airbreather 3 months ago

Great, until built in ads become part of the tech...

Reply View 1 reply

akadeb 3 months ago

Hi Mr. teddy bear!
Hey there buddy! Have you tried brushing with Sensodyne now available at your nearest CVS only for $9.99!

Reply View | 0 replies

stavros 3 months ago

This is great, thank you! I can learn a lot from this.

Reply View 1 reply

akadeb 3 months ago

thank you stavros!

Reply View | 0 replies

mcdow 3 months ago

Dude this is super cool! What made you decide to open source it?

I had a similar idea that I never followed through with(even down to using an ESP).

Basically you could make a Harry Potter talking painting with basically your device + an e-ink display that displays some 3D modeled character.

For others, here’s a direct link to a demo video:

https://m.youtube.com/watch?v=o1eIAwVll5I

Reply View 3 replies

magixx 3 months ago

I also thought about this but wanted to look into an ESP32 CAM to get vision working. For better or worse I didn't pursue the idea as I thought in the end repurposing a cell phone would be better overall.
I do wonder if the cellphone/app argument is why we didn't see that many hardware LLM API wrappers up until now. The rabbit R1 was basically just that.
I've seen more products in this space recently such as Ropet[1], LOOI[2], and others but for now it's going to be costly for companies to sell such a product at a fixed cost as I think a subscription model would be a hard sell [3] for consumers.
[1] https://www.kickstarter.com/projects/1067657324/ropet-your-n... [2] https://looirobot.com/products/looi-robot?variant=4909200762... [3] https://tech.yahoo.com/ai/articles/tragic-robot-shutdown-sho...

Reply View | 0 replies
Sean-Der 3 months ago

I get a `Request has expired` could you upload somewhere else?

Reply View | 1 reply
- mcdow 3 months ago
  
  My bad! Updated the link.
  
  Reply View | 0 replies

gbertb 3 months ago

great stuff! thanks for sharing

Reply View 1 reply

akadeb 3 months ago

thanks for checking it out Bert

Reply View | 0 replies

wormlord 3 months ago

What could go wrong?

Reply View 1 reply

akadeb 3 months ago

Murphy's law

Reply View | 0 replies

deepcurryshit 3 months ago

[flagged]

Reply View 0 replies

ForHackernews 3 months ago

This is a cool demo but I would not let my child play with anything that talks to a cloud AI like this. Furby fever dreams made real.

Reply View 2 replies

akadeb 3 months ago

I understand, is it the realtime conversational aspect or just in general you wouldn't want a child to play with a TTS-like service?

Reply View | 0 replies
[removed] 3 months ago

[deleted]

Reply View | 0 replies

hakaneskici 3 months ago

Amazing, thank you for sharing. I'm interested in learning about your experience while building this :)

What kind of interesting challenges have you run into, and how have your work influenced the OpenAI's realtime API?

PS: Your github readme is quite well crafted, nowadays hard to come across.

Reply View 9 replies

akadeb 3 months ago

Thank you! It's been super fun to work on. The challenges were more on the ESP32 side. Like getting audio to work smoothly with Opus and the audio timing challenges. This is one of the reasons I open-sourced.
It seems pointless to think that everyone should cross that C++/Audio barrier to make something cool. Using this cuts a lot of dev time and brings products out to market wayy quicker. The repo basically helps launch your AI toy brand

Reply View | 0 replies
reolbox 3 months ago

This is an AI reply.

Reply View | 7 replies
- [removed] 3 months ago
  
  [deleted]
  
  Reply View | 0 replies
- hakaneskici 3 months ago
  
  What made you think that?
  
  Reply View | 5 replies
  
  johnisgood 3 months ago
  
  The README seems like what GPT would spit out, with all the emojis, diagrams, etc.
  Not the first time I ran into it, but I did not bother commenting.
  I can recognize it from far away. Thankfully I am not the only one.
  
  Reply View | 4 replies