Comment by shadowgovt

Comment by shadowgovt 2 days ago

4 replies

On the other hand, I already have an HTML parser, and your bespoke API would require a custom tool to access.

Multiply that by every site, and that approach does not scale. Parsing HTML scales.

swiftcoder 2 days ago

You already have a JSON and XML parser too, and the website offers standardised APIs in both of those

  • shadowgovt 2 days ago

    Not standardized enough; I can't guarantee the format of an API is RESTful, I can't know apriori what the response format is (arbitrary servers on the internet can't be trusted to be setting content type headers properly) or How to crawl it given the response data, etc. we ultimately never solved the problem of universal self- describing APIs, so a general crawling service can't trust they work.

    In contrast, I can always trust that whatever is returned to be consumed by the browser is in the format that is consumable by a browser, because if it isn't the site isn't a website. Html is pretty much the only format guaranteed to be working.

dmitrygr 2 days ago

parsing html -> lazy but ok

using an llm to parse html -> please do not

  • llbbdd 2 days ago

    > Lazy but ok

    You're absolutely welcome on your own free time to waste it on whatever feels right

    > using an llm to parse html -> please do not

    have you used any of these tools with a beginner's mindset in like, five years?