Comment by gwking

Comment by gwking 16 hours ago

0 replies

An example related to JSON content is HTML content. I have a Python library that represents all of the standard HTML tags as a family of classes. It is like a lightweight DOM on the server side, and has resulted in a web server that does not use string based templating at all. It lets me construct trees of HTML completely in Python and then render them out with everything correctly escaped. I can also parse HTML into trees and manipulate them as I please (for e.g. scraping tasks and document transforms). It is all strongly typed using mypy and I adhere to the strictest generic typing I can manage.

Each node has a list of children, and the element type is `str|HtmlNode`. I find this vastly easier to use than the LXML ETree api, where nodes have `text` and `tail` attributes to represent interleaved text.

Interestingly, the LXML docs promote their design as follows: > he two properties .text and .tail are enough to represent any text content in an XML document. This way, the ElementTree API does not require any special text nodes in addition to the Element class, that tend to get in the way fairly often (as you might know from classic DOM APIs). https://lxml.de/tutorial.html#elements-contain-text

It could be a simple matter of taste! But I suspect that the difference between what they are describing as "classic DOM" vs what I am doing is that they are referring to experience with C/C++/Java libraries circa 2009 that had much less convenient dynamic type introspection. The "get in the way fairly often" reminds me of how verbose it is to deal with heterogenous data in C/C++/ObjC. In ObjC for example, you could have an array mixing NSString with other NSObject subclasses, but you had to do work to type it correctly. If you wanted numbers in there you had to use NSNumber which is an annoying box type that you never otherwise use. And ObjC was considered very dynamic in its day!

I have long felt that the root of much evil was the overbearing distinction between primitive and object types in C++/Java/Objective-C.

All of this is a long way of saying, I think "how to deal with heterogenous lists of stuff" is a huge question in language design, library design, and the daily work of programming. Modern languages have by no means converged on a single way to represent varying types of elements. If you want to create trees of stuff, at some level that is "mixing types in a list" no matter how you might try to encode it. Just food for thought!