Comment by shadowgovt

Comment by shadowgovt 2 days ago

View on Hacker News

On the other hand, I already have an HTML parser, and your bespoke API would require a custom tool to access.

Multiply that by every site, and that approach does not scale. Parsing HTML scales.

swiftcoder 2 days ago

You already have a JSON and XML parser too, and the website offers standardised APIs in both of those

Reply View 1 reply

shadowgovt 2 days ago

Not standardized enough; I can't guarantee the format of an API is RESTful, I can't know apriori what the response format is (arbitrary servers on the internet can't be trusted to be setting content type headers properly) or How to crawl it given the response data, etc. we ultimately never solved the problem of universal self- describing APIs, so a general crawling service can't trust they work.
In contrast, I can always trust that whatever is returned to be consumed by the browser is in the format that is consumable by a browser, because if it isn't the site isn't a website. Html is pretty much the only format guaranteed to be working.

Reply View | 0 replies

dmitrygr 2 days ago

parsing html -> lazy but ok

using an llm to parse html -> please do not

Reply View 1 reply

llbbdd 2 days ago

> Lazy but ok
You're absolutely welcome on your own free time to waste it on whatever feels right
> using an llm to parse html -> please do not
have you used any of these tools with a beginner's mindset in like, five years?

Reply View | 0 replies