Comment by PaulHoule

Comment by PaulHoule a day ago

0 replies

Funny my experience is that properly written HTML parsers can be easy to specialize quickly for a wide range of web sites whereas just logging in to an API can be a battle with a Rube Goldberg machine for what… a license to suck through a coffee stirrer? I am still using a parser I wrote for Flickr image galleries 15+ years ago that frequently “just works” on new sites without modification and when it does take modification the new rules are a handful of LoC.

The mosr remarkable case I ever saw was trying to parse Wikipedia markup from the data dumps that they quit publishing and struggling to get better than 98% accuracy and then writing a close to perfect HTML-based parser in minutes starting with the Flick parser.

Almost always an APi is not a gift but rather a take-away.

That said, when I wrote Blackbird, my first web crawler, in 1998, I was already obsessive about politeness and efficiency from a “low observability” perspective as much as being the right thing to do.