Comment by tannhaeuser

Comment by tannhaeuser 3 days ago

0 replies

Nice to meet a fellow SGML fan!

> When you say that macros/entities may contain parameters, I think you are referring to recursive entity expansion,

No, I'm referring to SGML data attributes (attributes declared on notations having concrete values defined on entities of the respective notation); cf. [1]. In sgmljs.net SGML, these can be used for SGML templating which is a way of using data entities declared as having the SGML notation (ie. stand-alone SGML files or streams) to replace elements in documents referencing those entities. Unlike general entities, this type of entity expansion is bound to an element name and is informed of the expected content model and other contextual type info at the replacement site, hence is type-safe. Data attributes supplied at the expansion site appear as "system-specific entities" in the processing context of the template entity. See [2] for details and examples.

Understanding and appreciating the construction of templating as a parametric macro expansion mechanism without additional syntax may require intimate knowledge of lesser known SGML features such as LPDs and data entities, and also some HyTime concepts.

> create an entity to generate a Github repo link

Templating can turn text data from a calling document into an entity in the called template sub-processing context so might help with your use case, and with the limitation to have to declare things in DTDs upfront in general.

> it’s not possible to parse a document without having the entire thing from the start, and because of things like tag omission (which is part of its syntax “MINIMIZATION” features), it’s often not possible to parse a document without having _everything up to the end_.

Why do you think so and why should this be required by tag inference specifically? In sgmljs.net SGML, for external general entities (unlike external parameter entities which are expanded at the point of declaration rather than usage), at no point does text data have to be materialised in its entirety. The parser front-end just switches input events from another external source during entity expansion and switches back afterwards, maintaining a stack of open entities.

Regarding namespaces, one of their creators (SGML demi-good James Clark himself) considers those a failure:

> the pain that is caused by XML Namespaces seems massively out of proportion to the benefits that they provide (cf. [3]).

In sgmljs.net SGML, you can handle XML namespace mappings using the special processing instructions defined by ISO/IEC 19757-9:2008. In effect, element and attributes having names "with colons" are remapped to names with canonical namespace parts (SGML names can allow colons as part of names), which seems like the sane way to deal with "namespaces".

I haven't checked your site, but most certainly will! Let's keep in touch; you might also be interested in sgmljs.net SGML and the SGML DTD for modern HTML at [4], to be updated for WHATWG HTML review draft January 2025 when/if it's published.

Edit:

> Would love to hear what you are referring to with “safe” third-party transclusion and also what features are available for removal or rejection of undesired script in user content.

In short, I was mainly referring to DTD techniques (content models, attribute defaults) here.

[1]: https://sgmljs.net/docs/sgmlrefman.html#data-entities

[2]: https://sgmljs.net/docs/templating.html

[3]: https://blog.jclark.com/2010/01/xml-namespaces.html

[4]: https://sgmljs.net/docs/html5.html