Comment by tannhaeuser

Comment by tannhaeuser a day ago

9 replies

You're not wrong. The amount of fields of use where XML is used bogusly which have nothing to do with markup is truly staggering. XML signing and relying on canonical XML serialization for it is just peak and something else.

The sad thing is that XML was meant as a simplification over full SGML for delivery of markup on the web. Specifically, XML is always fully tagged (doesn't make use of tag inference), and does neither have empty ("void") elements nor short forms for attributes such as in <option selected>. Thus XML never needs markup declarations for special per-element or per-attribute parsing rules. This was done to facilitate newer vocabularies next to HTML like SVG and MathML.

But soon enough, folks took the XML specification as an invitation for complexity and a self-serving spec circus: namespaces, XInclude (as a bogus replacement for entity expansion), XQuery, XSLT, XML Schema as a super-verbose replacement for DTDs using XML itself, etc. XHTML 2 was the largest failure and turning point, introducing not just a new vocabulary, but trying to reinvent how browsers work in a design-by-comittee fashion. It could be said that XHTML took W3C down along with it.

For message payloads in large and long-term multi-party projects (governments, finance/payments, healthcare, etc.), I'm however not sure the alternatives (JSON-over-HTTP and the idiotic quasi-religious apeal to misunderstood "REST" semantics) is really helping. XML Schema, while in part overkill and unused (substitution groups), certainly has facilitated separating service interface from service implementation, multiple generations and multiple implementations, test cases bases, and other long-term maintenance goals.

eftpotrm 19 hours ago

I'm not going to argue that XML hasn't been used badly and excessively in a lot of places, it really has, and using every part of it religiously will tie you in knots, fast.

But I can't help noticing that Json is gaining more and more XML-like functionality through things like schemas and JsonPath, as people slowly realise why XML had those functions they're now having to replace. I'm a long way from convinced that all the engineering effort to switch was actually beneficial.

  • dwaite 12 hours ago

    And schema and paths have much the same issues - they are being used as tools in things like network-exchanged messages when the underlying specs and the implementations out there were not designed with that idea in mind.

    You are going to have a bad time if your schema validation tries to resolve schema URL by default.

    You are going to have a bad time if your JSONpath implementation supports the older "eval" mechanisms, or has unbounded memory/processing time growth from top-down traversal of the JSON.

    The issue in the article was purposely avoided in JSON by virtue of JWS not having canonicalization, transforms, or partial signatures. You sign a chunk of binary data, and that binary data might be parsable as JSON.

  • darby_nine 14 hours ago

    > But I can't help noticing that Json is gaining more and more XML-like functionality through things like schemas and JsonPath, as people slowly realise why XML had those functions they're now having to replace.

    I think there's an analogy here to static typing and gradual typing. XML is a massive pain in the ass to implement and JSON is often good enough. Only having to implement the features you plan on using is quite nice.

    • eftpotrm 12 hours ago

      For who though?

      If you're a user of whatever-data-format designing your new application, you could always use the subset you actually cared about. No-one forced you to use all the complex bits in XML.

      If you're a library author - well, yes, you could implement a Json parser at first that was eval(input), then something more complex because that's a security hole, then something else again because that's not too quick, then a new library like JsonPath to get queryability, and... all your work is still less functional than the system you were trying to replace. So yes, you can possibly implement Json libraries in less code than implementing XML libraries. But unless you had a reason to implement a new XML library from scratch anyway, that isn't actually a win.

      • darby_nine 11 hours ago

        > No-one forced you to use all the complex bits in XML.

        Just parsing XML alone is hugely painful, let alone implementing the rest of the stuff like XSLT, namespaces, validation, xpath, etc etc. Plus, once you've done this, you still need a natural way to map this into domain types, or you need to force people into a visitor pattern or some other awkward deserialization technique. JSON just requires a single JSONValue sum type.

        XML has its place, but it'd have to be a pretty extreme case of needing rigor or a tree where I need to be able to peg arbitrary attributes onto the nodes in order to see it as attractive. Most APIs won't benefit from all of XML's features.

        For instance I maintain a podcasting/rss feed library and XML (And more importantly, the way people publish invalid xml) makes me really wish they had gone with a different format in the day that was harder to fuck up.

  • aversis_ 18 hours ago

    Devs just love reinventing the wheel every few years. Maybe we should switch to CSV next just for fun.

darby_nine 14 hours ago

> have nothing to do with markup

Yes, it's a badly named language. It has nothing to do with markup. As always, intentions don't matter at all, and it's the best tool we have for certain types of structures.

  • dwaite 12 hours ago

    It is a properly named language. Just a staggering majority of XML use is for a purpose it was not originally designed for. It is meant to be a good tool for progressive enhancement text markup, such as stylizing hypertext or math equations.

    • darby_nine 9 hours ago

      > It is meant to be a good tool for progressive enhancement text markup, such as stylizing hypertext or math equations.

      Well, at least hypertext. I've seen the disaster that is MathML.