Comment by devinprater
Comment by devinprater a day ago
Apple has a video understanding model too. I can't wait to find out what accessibility stuff they'll do with the models. As a blind person, AI has changed my life.
Comment by devinprater a day ago
Apple has a video understanding model too. I can't wait to find out what accessibility stuff they'll do with the models. As a blind person, AI has changed my life.
Like many others, I too would very much like to hear about this.
I taught our entry-level calculus course a few years ago and had two blind students in the class. The technology available for supporting them was abysmal then -- the toolchain for typesetting math for screen readers was unreliable (and anyway very slow), for braille was non-existent, and translating figures into braille involved sending material out to a vendor and waiting weeks. I would love to hear how we may better support our students in subjects like math, chemistry, physics, etc, that depend so much on visualization.
I did a maths undergrad degree and the way my blind, mostly deaf friend and I communicated was using a stylized version of TeX markup. I typed on a terminal and he read / wrote on his braille terminal. It worked really well.
For a physical view on this see:
https://www.reddit.com/r/openscad/comments/1p6iv5y/christmas...
The creator, https://www.reddit.com/user/Mrblindguardian/ has asked for help a few times in the past (I provided feedback when I could), but hasn't needed to as often of late, presumably due to using one or more LLMs.
+1 and I would be curious to read and learn more about it.
A blind comedian / TV personality in the UK has just done a TV show on this subject - I haven't seen it, but here's a recent article about it: https://www.theguardian.com/tv-and-radio/2025/nov/23/chris-m...
Chris McCausland is great. A fair bit of his material _does_ reference his visual impairment, but it's genuinely witty and sharp, and it never feels like he's leaning on it for laughs/relying on sympathy.
He did a great skit with Lee Mack at the BAFTAs 2022[0], riffing on the autocue the speakers use for announcing awards.
Hilariously, he beat the other teams in the “Say What You See” round (yes, really) of last year’s Big fat Quiz. No AI involved.
Same! @devinprater, have you written about your experiences? You have an eager audience...
I suppose I should write about them. A good few will be about issues with the mobile apps and websites for AI, like Claude not even letting me know a response is available to read, let alone sending it to the screen reader to be read. It's a mess, but if we blind people want it, we have to push through inaccessibility to get it.
What other accessibility features do you wish existed in video AI models? Real-time vs post-processing?
Mainly realtime processing. I play video games, and would love to play something like Legend of Zelda and just have the AI going, then ask it "read the menu options as I move between them," and it would speak each menu option as the cursor moves to it. Or when navigating a 3D environment, ask it to describe the surroundings, then ask it to tell me how to get to a place or object, then it guide me to it. That could be useful in real-world scenarios too.
I have to believe you used the word see twice ironically.
My wife is deaf, and we had one kid in 2023 and twins in 2025. There's been a noticeable improvement baby cry detection! In 2023, the best we could find was a specialized device that cost over $1,000 and has all sorts of flakiness/issues. Today, the built-in detection on her (android) phone + watch is better than that device, and a lot more convenient.
Is that something you actually need AI for though? A device with a sound sensor and something that shines/vibrate a remote device when it detects sound above some threshold would be cheaper, faster detection, more reliable, easier to maintain, and more.
But your solution costs money in addition to the phone they already own for other purposes. And multiple things can make loud noises in your environment besides babies; differentiating between a police siren going by outside and your baby crying is useful, especially if the baby slept through the siren.
The same arguments were said for blind people and the multitude of one-off devices that smartphones replaced, OCR to TTS, color detection, object detection in photos/camera feeds, detecting what denomination US bills are, analyzing what's on screen semantically vs what was provided as accessible text (if any was at all), etc. Sure, services for the blind would come by and help arrange outfits for people, and audiobook narrators or braille translator services existed, and standalone devices to detect money denominations were sold, but a phone can just do all of that now for much cheaper.
All of these accessibility AI/ML features run on-device, so the knee-jerk anti-AI crowd's chief complaints are mostly baseless anyways. And for the blind and the deaf, carrying all the potential extra devices with you everywhere is burdensome. The smartphone is a minimal and common social and physical burden.
> more reliable
I've worked on some audio/video alert systems. Basic threshold detectors produce a lot of false positives. It's common for parents to put white noise machines in the room to help the baby sleep. When you have a noise generating machine in the same room, you need more sophisticated detection.
False positives are the fastest way to frustrate users.
>Is that something you actually need AI for though?
Need? Probably not. I bet it helps though (false positives, etc.)
>would be cheaper, faster detection, more reliable, easier to maintain, and more.
Cheaper than the phone I already own? Easier to maintain than the phone that I don't need to do maintenance on?
From a fun hacking perspective, a different sensor & device is cool. But I don't think it's any of the things you mentioned for the majority of people.
You are talking about a device of smart phone complexity. You need enough compute power to run a model that can distinguish noises. You need a TCP/IP stack and a wireless radio to communicate the information. At that point you have a smart phone. A simple sound threshold device would have too many false positives/negatives to be useful.
> As a blind person, AI has changed my life.
I know this is a low quality comment, but I'm genuinely happy for you.
I guess that auto-generated audio descriptions for (almost?) any video you want is a very, very nice feature for a blind person.
My two cents, this seems like a case where it’s better to wait for the person’s response instead of guessing.
My two cents, this seems like a comment it should be up to the OP to make instead of virtue signaling.
guessing that being able to hear a description of what the camera is seeing (basically a special case of a video) in any circumstances is indeed life changing if you're blind...? take a picture through the window and ask what's the commotion? door closed outside that's normally open - take a picture, tell me if there's a sign on it? etc.
Not the gp, but currently reading a web novel with a card game where the author didn't include alt text in the card images. I contacted them about it and they started, but in the meantime ai was a big help. all kinds of other images on the internet as well when they are significant to understanding the surrounding text. better search experience when Google, DDG, and the like make finding answers difficult. I might use smart glasses for better outdoor orientation, though a good solution might take some time. phone camera plus ai is also situationally useful.
As a (web app) developer I never quite sure what to put in alt. Figured you might have some advice here?
> As a (web app) developer I never quite sure what to put in alt.
Are you making these five mistakes when writing alt text? [1] Images tutorial [2] Alternative Text [3]
[1]: https://www.a11yproject.com/posts/are-you-making-these-five-...
I'm gonna flip this around... have you tried pasting the image (and the relevant paragraph of text) and asking ChatGPT (or another LLM) to generate the alt text for the image and see what it produces?
For example... https://chatgpt.com/share/692f1578-2bcc-8011-ac8f-a57f2ab6a7...
> I'm gonna flip this around... have you tried pasting the image (and the relevant paragraph of text) and asking ChatGPT (or another LLM) to generate the alt text for the image and see what it produces?
There's a great app by an indie developer that uses ML to identify objects in images. Totally scriptable via JavaScript, shell script and AppleScript. macOS only.
Could be 10, 100 or 1,000 images [1].
The question to ask is, what a sighted person learns after looking at the image? The answer is the alt text. E.g if the image is a floppy, maybe you communicate that this is the save button. If it shows a cat sleeping on the windowsill, the alt text is yep: "my cat looking cute while sleeping on the windowsill".
Important to add for blind people: "... assuming they never seen anything and visual metaphors won't work"
The amount of times I've seem captions that wouldn't make sense for people who never been able to see is staggering, I don't think most people realize how visual our typical language usage is.
Image descriptions. TalkBack on Android has it built in and uses Gemini. VoiceOver still uses some older, less accurate, and far less descriptive ML model, but we can share images to Seeing AI or Be My Eyes and such and get a description.
Video descriptions, through PiccyBot, have made watching more visual videos or videos where things happen that don't make sense without visuals much easier. Of course, it'd be much better if YouTube incorporated audio description through AI the same way they do captions, but that may happen in a good 2 years or so. I'm not holding my breath. Google as a whole is hard to get accessibility out of more than the bare minimum.
Looking up information like restaurant menus. Yes it can make things up, but worst-case, the waiter says they don't have that.
Finally good news about the AI doing something good for the people.
The smiley at the end doesn’t hide how awful your comment is.
So serious... you should relax a bit and work up on your humor reception/understanding (smiley intentionally left out this time)
People need to understand that a lot of angst around AI comes from AI enabling people to do things that they formally needed to go through gatekeepers for. The angst is coming from the gatekeepers.
AI has been a boon for me and my non-tech job. I can pump out bespoke apps all day without having to get bent on $5000/yr/usr engineering software packages. I have a website for my side business that looks and functions professionally and was done with a $20 monthly AI subscription instead of a $2000 contractor.
AI is divine retribution for artists being really annoying on Twitter.
I highly doubt "pumping out bespoke apps all day" is possible yet besides 100% boilerplate, and when possible then no good for any other purpose than enshittifiying the web, and at that point not profitable because everyone can do it.
I use AI daily as a senior coder for search and docs, and when used for prototyping you still need to be a senior coder to go from say 60% boilerplate to 100% finished app/site/whatever unless it's incredibly simple.
> I use AI daily as a senior coder for search and docs, and when used for prototyping you still need to be a senior coder to go from say 60% boilerplate to 100% finished app/site/whatever unless it's incredibly simple.
I know you would like to believe that, but with the tools available NOW, that's not necessarily the case. For example, by using the Playwright or Chrome DevTools MCPs, models can see the web app are it's being created and it's pretty easy to prompt them to fix something they can see.
These models know the current frameworks and coding practices but they do need some guidance; they're not mindreaders.
I still don't believe that. Again yes a boilerplate calculator or recipe app probably, but anything advanced real world with latency issues, scaling, race conditions, css quirks, design weirdness, optimisation - in other words the things that actually require domain knowledge i still don't get much help with, even with Claude Code, pointers yes but they completely fumble actual production code in real world scenarios.
Again it's the last 5% that takes 95% of the time, and those 5% i haven't seen fixed with Claude or Gemini, because it's essentially quirks, browser errors, race conditions, visual alignment, etc etc. All stuff that completely goes way above any LLM's head atm from what i've seen.
They can definitely bullshit a 95% working app though, but that's 95% from being done ;)
Often the problem with tech people is they think software only exists for tech or for being sold to others from tech.
Nothing I do is in the tech industry. It's all manufacturing and all the software is for in-house processes.
Believe it or not, software is useful to everyone and no longer needs to originate from someone who only knows software.
This is the same as the discussion about using Excel. Excel has its limitations, but it has enabled millions of people to do pretty sophisticated stuff without the help of “professionals”. Most of the stuff us tech people do is also basically some repetitive boilerplate. We just like to make things more complex than they need to be. I am always a little baffled why seemingly every little CRUD site that has at most 100 users needs to be run on Kubernetes with several microservices, CI/CD pipelines, and whatever.
As far as enshittification goes, this was happening long before AI. It probably started with SEO and just kept going from there.
Hi Devin and other folks, I'm looking for software developers who are blind or hard of sight as there is a tool I'm building that I think might be of interest to them (it's free and open source). If you or anyone you know is interested in trying it please get in touch through my email.
I'm only commenting because I absolutely love this thread. It's an insight into something I think most of us are quite (I'm going to say it...) blind to in our normal experiences with daily life, and I find immense value in removing my ignorance about such things.
I wonder if there's anything that can help blind people to navigate the world more easily - I guess in the future AR Glasses won't just be for the sighted but allow people without vision to be helped considerably. It really is both amazing and terrifying the future we're heading towards.
AURA Vision for blind and low vision people has been doing this for years. Be My Eyes has been doing this for years without AI. Meta Ray-Bans can do this. There's nothing new coming soon that hasn't already been available for a while, only refinements.
From a couple years ago...
https://www.microsoft.com/en-us/garage/wall-of-fame/seeing-a...
... and that was 10 years ago. I'm curious for what it could do now.
> As a blind person, AI has changed my life.
Something one doesn't see in news headlines. Happy to see this comment.