Comment by swatcoder

Comment by swatcoder 2 days ago

1 reply

> LLMs a lot of the ambiguity of HTML disappears as far as a scraper is concerned

The more effective way to think about it is that "the ambiguity" silently gets blended into the data. It might disappear from superficial inspection, but it's not gone.

The LLM is essentially just doing educated guesswork without leaving a consistent or thorough audit trail. This is a fairly novel capability and there are times where this can be sufficient, so I don't mean to understate it.

But it's a different thing than making ambiguity "disappear" when it comes to systems that actually need true accuracy, specificity, and non-ambiguity.

Where it matters, there's no substitute for "very explicit structured data" and never really can be.

llbbdd 2 days ago

Disappear might be an extremely strong word here, but yeah as you said as the delta closes between what a human user and an AI user are able to interpret from the same text, it becomes good enough for some nines of cases. Even if on paper it became mathematically "good enough" for high-risk cases like medical or government data structured data will still have a lot of value. I just think more and more structured data is going to be cleaned up from unstructured data except for those higher precision cases.