Comment by shakna
I don't need the text of the page. Thats easy, and I already have it.
But information has a hierarchy, usually visual, and that hierarchy needs to be reflected. LLMs are famously bad at structure, especially any tree with significant depth. RAG is not enough - hallucinations become common at depth.
My response now, to you, is in a semi-structured node graph. I know a reply has happened, because of the dangling children. I know who made it, and what they said, by cell attributes in the spans, surrounding it.
Don't worry - AI is being shoved down accessibility's throat, like everywhere else. FSCompanion for JAWS, NVDA has an OpenAI plugin, and VoiceOver has it builtin.
Why do I hate it? Because when it _doesn't work_, you can't tell. You don't know if it is hallucinating data, and cannot verify the response. If it is the mode of communication, it is all you have, making every failure a catastrophic failure.
Thanks for helping me, and hopefully others, understand the challenges more!