Comment by HiPhish

Comment by HiPhish 15 hours ago

58 replies

Serious question: why would you ever want to not close tags? It saves a couple of key strokes, but we have snippets in our editors, so the amount of typing is the same. Closed tags allow editors like Vim or automated tools to handle the source code easier; e.g. I can type `dit` in Vim to delete the contents of a tag, something that's only possible because the tag's content is clearly delimited. It makes parsing HTML easier because there are fewer syntax rules.

I learned HTML quite late, when HTML 5 was already all the rage, and I never understood why the more strict rules of XML for HTML never took off. They seem so much saner than whatever soup of special rules and exceptions we currently have. HTML 5 was an opportunity to make a clear cut between legacy HTML and the future of HTML. Even though I don't have to, I strive to adhere to the stricter rules of closing all tags, closing self-closing tags and only using lower-case tag names.

pwdisswordfishy 14 hours ago

> I never understood why the more strict rules of XML for HTML never took off

Internet Explorer failing to support XHTML at all (which also forced everyone to serve XHTML with the HTML media type and avoid incompatible syntaxes like self-closing <script />), Firefox at first failing to support progressive rendering of XHTML, a dearth of tooling to emit well-formed XHTML (remember, those were the days of PHP emitting markup by string concatenation) and the resulting fear of pages entirely failing to render (the so-called Yellow Screen of Death), and a side helping of the WHATWG cartel^W organization declaring XHTML "obsolete". It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.

I think most of those are actually no longer relevant, so I still kind of hope that XHTML could have a resurgence, and that the tag-soup syntax could be finally discarded. It's long overdue.

  • xg15 11 hours ago

    What I never understood was why, for HTML specifically, syntax errors are such a fundamental unsolvable problem that it's essential that browsers accept bad content.

    Meanwhile, in any other formal language (including JS and CSS!), the standard assumption is that syntax errors are fatal, the responsibility for fixing lies with the page author, but also that fixing those errors is not a difficult problem.

    Why is this a problem for HTML - and only HTML?

    • notahacker 9 hours ago

      HTML is a markup language to format text, not a programming or data serialization language so end users have always preferred to see imperfectly coded or incompletely loaded web pages imperfectly rendered over receiving a failure message, particularly on 90s dialup. Same applies to most other markup languages.

      The web owes its success to having low barriers to entry and very quickly became a mixture of pages hand coded by people who weren't programmers, content produced by CMS systems which included stuff the content author didn't directly control and weren't necessarily reliable at putting tags into the right place, and third party widgets activated by pasting in whatever code the third party had given you. And browsers became really good at attempting to rendering erroneous and ambiguous markup (and for that matter were usually out of date or plain bad at rigidly implementing standards)

      There was a movement to serve XHTML as XML via the application/xhtml+xml MIME type but it never took off because browsers didn't do anything with it except loading a user-hostile error page if a closing tag was missed (or refusing to load it at all in the case of IE6 and older browsers), and if you wanted to do clever transformation of your source data, there were ways to achieve that other than formatting the markup sent to the browser as a subset of XML

    • jasode 9 hours ago

      >Why is this a problem for HTML - and only HTML?

      Your premise is not correct because you're not aware that other data formats also have parsers that accept malformed content. Examples:

      - pdf files: many files with errors can be read by Adobe Acrobat. And code PDF libraries for developers often replicate this behavior so they too can also open the same invalid pdf files.

      - zip files. 7-Zip and WinRAR can open some malformed zip files that don't follow the official PKZIP specification. E.g. 7-Zip has extra defensive code that looks for a bad 2-byte sequence that shouldn't be there and skips over it.

      - csv files. MS Excel can read some malformed csv files.

      - SMTP email headers: Mozilla Thunderbird, MS Outlook, etc can parse fields that don't exactly comply with RFC 822 -- make some guesses -- and then successfully display the email content to the user

      The common theme to the above, including HTML... the Raw Content is more important than a perfectly standards-compliant file format. That's why parsers across various domains make best efforts to load the file even when it's not 100% free of syntax errors.

      >Meanwhile, in any other formal language (including JS and CSS!), the standard assumption is that syntax errors are fatal,

      Parsing invalid CSS is not a fatal error. Example of validating HTML/CSS in a job listings webpage at Monster.com : https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.monster.c...

      It has CSS errors such as:

        Error: CSS: background-color: none is not a background-color value.  From line 276, column 212; to line 276, column 215
        Error: CSS: padding: 8x is not a padding value.
      
      Job hunters in the real world want to see the jobs because the goal is to get a paycheck. Therefor, a web browser that didn't show the webpage just because the author mistakenly wrote CSS "none" instead "transparent" and "8x" instead of "8px" -- would be user hostile software.
      • magicalhippo 8 hours ago

        > csv files. MS Excel can read some malformed csv files.

        At work we have to parse CSV files which often have mixed encoding (Latin-1 with UTF-8 in random fields on random rows), occasionally have partial lines (remainder of line just missing) and other interesting errors.

        We also have to parse fixed-width flat files where fields occasionally aren't fixed-width after all, with no discernible pattern. Customer can't fix the broken proprietary system that spits this out so we have to deal with it.

        And of course, XML files with encoding mismatch (because that header is just a fixed string that bears no meaning on the rest of the content, right?) or even mixed encoding. That's just par for the course.

        Just some examples of how fun parsing can be.

    • nicoburns 11 hours ago

      It's mostly historical. Browsers accepted invalid HTML for 10 years, there's a lot of content authored with that assumption that's never going to be updated, so now we're stuck with it.

      We could be more strict for new content, but why bother if you have to include the legacy parser anyway. And the HTML5 algorithm brings us most of the benefits (deterministic parsing) of a stricter syntax while still allowing the looseness.

      • [removed] 10 hours ago
        [deleted]
      • londons_explore 10 hours ago

        > never going to be updated, so now we're stuck with it.

        Try going to any 1998 web page in a modern browser... It's generally so broken so as to be unusable.

        As well as every page telling me to install flash, most links are dead, most scripts don't run properly (vbscript!?), tls versions now incompatible, etc.

        We shouldn't put much effort into backwards compatibility if it doesn't work in practice. The best bet to open a 1998 web page is to install IE6 in a VM, and everything works wonderfully.

    • bazoom42 11 hours ago

      Syntax errors are not fatal in CSS. CSS has detailed rules for how to handle and recover from syntax errors, usually by skipping the invalid token. This is what allows introducing new syntax in a backwards-compatible manner.

    • alwillis 5 hours ago

      > Meanwhile, in any other formal language (including JS and CSS!), the standard assumption is that syntax errors are fatal,

      In CSS, a syntax error isn't fatal. Most of the time, an unrecognized property causes that selector and all its properties to be ignored.

      :is() and :where() support forgiving selector list [1].

      Only the erroneous properties are ignored; the rest work normally.

      [1]: https://drafts.csswg.org/selectors-4/#typedef-forgiving-sele...

    • dragonwriter 8 hours ago

      > What I never understood was why, for HTML specifically, syntax errors are such a fundamental unsolvable problem that it's essential that browsers accept bad content.

      Because HTML is a content language, and at any given time the main purpose of the main engines using it will be to access a large array of content that is older than the newest revision of the language, and anything that creates significant incompatibilities or forces completely rewrites of large bodies of work to incorporate new features in a standard is simply not going to be implemented as specified by the major implementers (it will either not be implemented at all, or will be modified), because it is hostile what the implementations are used for.

    • barnabee 10 hours ago

      Because HTML is designed to be written by everyone, not just “engineers” and we’d rather be able to read what they have to say even if they get it wrong.

      • halapro 7 hours ago

        It's more that it's exceedingly easy to generate bad X(H)ML strings especially back when you had PHP concatenating strings as you went. Most HTML on the web is live/dynamic so there's no developer to catch syntax errors and "make build" again.

  • pwdisswordfishy 9 hours ago

    > It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.

    Well, this is not entirely true: XML namespaces enabled attaching arbitrary data to XHTML elements in a much more elegant, orthogonal way than the half-assed solution HTML5 ended up with (the data-* attribute set), and embedding other XML applications like XForms, SVG and MathML (though I am not sure how widely supported this was at the time; some of this was backported into HTML5 anyway, in a way that later led to CVEs). But this is rather niche.

  • hinkley 10 hours ago

    I was there, Gandalf. I was there 30 years ago. I was there when the strength of men failed.

    Netscape started this. NCSA was in favor of XML style rules over SGML, but Netscape embraced SGML leniency fully and several tools of that era generated web pages that only rendered properly in Netscape. So people voted with their feet and went to the panderers. If I had a dollar for every time someone told me, “well it works in Netscape” I’d be retired by now.

  • Sankozi 8 hours ago

    Emitting correct XHTML was not that hard. The biggest problem was that browsers supported plugins that could corrupt whole page. If you created XHTML webpage you had to handle bug reports caused by poorly written plugins.

bazoom42 13 hours ago

Why did markdown become popular when we already have html? Because markdown is much easier to write by hand in a simple text editor.

Original SGML was actually closer to markdown. It had various options to shorten and simplify the syntax, making it easy to write and edit by hand, while still having an unambiguous structure.

The verbose and explicit structure of xhtml makes it easier to process by tools, but more tedious for humans.

  • nathankleyn 9 hours ago

    Personally I think Markdown got _really_ popular not because it is easier to write but because it is easier to read.

    It’s kind of a huge deal that I can give a Markdown file of plain text content to somebody non-technical and they aren’t overwhelmed by it in raw form.

    HTML fails that same test.

    • epolanski 9 hours ago

      Or because it was the default in GitHub with an ad hoc renderer.

      • stouset 8 hours ago

        Markdown has been extremely popular since far before GitHub existed.

    • singingbard 7 hours ago

      People had already ditched writing HTML for years before Markdown came out.

      People were just using other markup languages like rST.

      Other attempts had already proven HTML to be a bad language for rough documentation. Someone then just needed to write a spec that was easy to implement and Markdown was that.

  • oneeyedpigeon 11 hours ago

    Is it really that much easier to write `<br>` and know that it isn't a problem, than just write `<br />`?

    • [removed] 6 hours ago
      [deleted]
    • barnabee 10 hours ago

      It’s much easier to have to remember fewer rules and for things to be ok if you get some wrong, yes.

      Especially for casual users of HTML.

      • epgui 9 hours ago

        Bad reasoning.

        “Always close your tags” is a simpler rule (and fewer rules, depending how you count) than “Close your tags, except possibly in situations A, B, C…”.

      • TZubiri 9 hours ago

        But learning about self closing tags is an additional rule

      • wizzwizz4 6 hours ago

        <script /> is invalid HTML, and <img></img> is also invalid HTML. There's no way to avoid knowing HTML syntax.

  • Pxtl 10 hours ago

    Imho the real strength of markdown is it forces people to stick to classes instead of styling. "I want to write in red comic Sans" " I don't care, you can't".

    And markdown tables are harder to write than HTML tables. However, they are generally easier to read. Unless multi line cell.

    • jbaber 9 hours ago

      I usually just write html tables, then convert to markdown via pandoc. It's a crazy world we live in.

dragonwriter 8 hours ago

> I never understood why the more strict rules of XML for HTML never took off.

Because of the vast quantity of legacy HTML content, largely.

> HTML 5 was an opportunity to make a clear cut between legacy HTML and the future of HTML.

WHATWG and its living standard that W3C took various versions of and made changes to and called it HTML 5, 5.1, etc., to pretend that they were still relevant in HTML, before finally giving up on that entirely, was a direct result of the failure of XHTML and the idea of a clear cut between legacy HTML and the future of HTML. It was a direct reaction against the “clear cut” approach based on experience, not an opportunity to repeat its mistakes. (Instead of a clear break, HTML incorporated the “more strict rules of XML” via the XML serialization for HTML; for the applications where that approach offers value, it is available and supported and has an object model 100% compatible with the more common form, and they are maintained together rather than competing.)

  • mgr86 6 hours ago

    I'd argue XHTML did take off and was very widely adopted for the first 5-10 years of the century.

MarsIronPI 10 hours ago

Because I want my hand-written HTML to look more like markdown-style languages. If I close those tags it adds visual noise and makes the text harder to read.

Besides, at this point technologies like tree-sitter make editor integration a moot point: once tree-sitter knows how to parse it, the editor does too.

kbolino 10 hours ago

A lot of HTML tags never have a body, so it makes no sense to close them. XML has self-closing tag syntax but it wasn't always handled well by browsers.

A p or li tag, at least when used and nested properly, logically ends where either the next one begins or the enclosing block ends. Closing li also creates the opportunity for nonsensical content inside of a list but not in any list item. Of course all of these corner cases are now well specified because people did close their tags sometimes.

  • afavour 10 hours ago

    > A p or li tag, at least when used and nested properly, logically ends where either the next one begins or the enclosing block ends

    While this is true I’ve never liked it.

        <p>blah<p>blah2</p>
    
    Implies a closing </p> in the middle. But

        <p>blah<span>blah2</p>
    
    Does not. Obviously with the knowledge of the difference between what span and p represent I understand why but in terms of pure markup it’s always left a bad taste in my mouth. I’ll always close tags whenever relevant even if it’s not necessary.
    • kbolino 9 hours ago

      This interpretation of the p element implies that it contains a paragraph. But HTML is first and foremost a document format, and one could just as logically conclude that the p element simply starts a new paragraph. Under the latter interpretation, </p> would never exist any more than </hr> or </img>.

      In practice, modern HTML splits the difference with rigorous and well defined but not necessarily intuitive semantics.

  • Pxtl 10 hours ago

    > XML has self-closing tag syntax but it wasn't always handled well by browsers.

    So we'll add another syntax for browsers to handle.

    https://xkcd.com/927/

coffeefirst 9 hours ago

I would argue the stricter rules did take off, most people always close <p>, it's pretty common to see <img/> over <img>—especially from people who write a lot of React.

But.

The future of HTML will forever contain content that was first handtyped in Notepad++ in 2001 or created in Wordpress in 2008. It's the right move for the browser to stay forgiving, even if you have rules in your personal styleguide.

ndiddy 8 hours ago

> I learned HTML quite late, when HTML 5 was already all the rage, and I never understood why the more strict rules of XML for HTML never took off. They seem so much saner than whatever soup of special rules and exceptions we currently have.

XHTML came out at a time when Internet Explorer, the most popular browser, was essentially frozen apart from security fixes because Microsoft knew that if the web took off as a viable application platform it would threaten Windows' dominance. XHTML 1.1 Transitional was essentially HTML 4.01 except that if it wasn't also valid XML, the spec required the browser to display a yellow "parsing error" page rather than display the content. This meant that any "working" XHTML site might not display because the page author didn't test in your browser. It also meant that any XHTML site might break at any time because a content writer used a noncompliant browser like IE 6 to write an article, or because the developers missed an edge case that causes invalid syntax.

XHTML 2.0 was a far more radical design. Because IE 6 was frozen, XHTML 2.0 was written with the expectation that no current web browser would implement it, and instead was a ground-up redesign of the web written "the right way" that would eventually entirely replace all existing web browsers. For example, forms were gone, frames were gone, and all presentational elements like <b> and <i> were gone in favor of semantic elements like <strong> and <samp> that made it possible for a page to be reasoned about automatically by a program. This required different processing from existing HTML and XHTML documents, but there was no way to differentiate between "old" and "new" documents, meaning no thought was given to adding XHTML 2.0 support to browsers that supported existing web technologies. Even by the mid-2000s, asking everyone to restart the web from scratch was obviously unrealistic compared to incrementally improving it. See here for a good overview of XHTML 2.0's failure from a web browser implementor's perspective: https://dbaron.org/log/20090707-ex-html

jrm4 8 hours ago

This really does feel like a job for auto-complete -slash- Generative ai tools.

Wowfunhappy 8 hours ago

Imagine if you were authoring and/or editing prose directly in html, as opposed to using some CMS. You're using your writing brain, not your coding brain. You don't want to think about code.

It's still a little annoying to put <p> before each paragraph, but not by that much. By contrast, once you start adding closing tags, you're much closer to computer code.

I'm not sure if that makes sense but it's the way I think about it.

  • SoftTalker 7 hours ago

    It's honestly no worse than Markdown, reST, or any of the other text-based "formats." It's just another format.

    Any time I have to write Markdown I have to open a cheat sheet for reference. With HTML, which I have used for years, I just write it.

onion2k 11 hours ago

In the case of <br/> and <img/> browsers will never use the content inside of the tag, so using a closing tag doesn't make sense. The slash makes it much clearer though, so missing it out is silly.

  • ndiddy 10 hours ago

    "Self-closing tags" are not a thing in HTML5. From the HTML standard:

    > On void elements, [the trailing slash] does not mark the start tag as self-closing but instead is unnecessary and has no effect of any kind. For such void elements, it should be used only with caution — especially since, if directly preceded by an unquoted attribute value, it becomes part of the attribute value rather than being discarded by the parser.

    It was mainly added to HTML5 to make it easier to convert XHTML pages to HTML5. IMO using the trailing slash in new pages is a mistake. It makes it appear as though the slash is what closes the element when in reality it does nothing and the element is self-closing because it's part of a hardcoded set of void elements. See here for more information: https://github.com/validator/validator/wiki/Markup-%C2%BB-Vo...

    • ndriscoll 10 hours ago

      It's not a mistake if you want to be able to use XML tools on your HTML. It's basically no effort to make HTML also be valid XML so you'd might as well get the additional tooling compatibility and simplicity for free. For the same reason, it's courteous toward others.

  • Hendrikto 9 hours ago

    Self-closing tags do nothing in HTML though. They are ignored. And in some cases, adding them obfuscates how browser’s will actually interpret the markup, or introduce subtle differences between HTML and JSX, for example.

  • jakelazaroff 10 hours ago

    How does the slash make it clearer? It's totally inert, so if you try to do the same thing with a non-void tag the results will not be what you expect!

    • onion2k 10 hours ago

      It indicates that the content that follows is not inside of the tag without the reader needing to remember how HTML works. Tags should have either a self-closing slash, or a closing tag.

      The third way of a bare tag is where the confusion comes from.

      • jakelazaroff 10 hours ago

        It doesn't indicate that, though. If you write <div />, for example, the content that follows is inside of the tag. So the reader still needs to remember how HTML works, because the slash does nothing.

        • jraph 9 hours ago

          Contrary to <img /> or <br />, <div /> is necessarily a mistake or intentionally misleading. The unfamiliar reader should not stumble upon <div /> too often. <div /> is a bug. It's a bit like using misleading indentation in C-like programming languages. Yeah, it can happen, and is a source of bugs, but if the page is well written, the regularity of having everything closed, even if it's decorative for the spec, can help the unfamiliar reader who doesn't have all the parsing rules in mind.

          Now, we can discuss whether we should optimize for the unfamiliar reader, and whether the illusion of actual meaning the trailing slash in HTML5 can be harmful.

          I would note that exactly like trailing slashes, indentation doesn't mean anything for the parser in C-like languages and can be written misleadingly, yet we do systematically use it, even when no unfamiliar reader is expected.

          At this point, writing a slash or not and closing all the tags is a coding style discussion.

          Now, maybe someone writing almost-XHTML (closing all tags, putting trailing slashes, quoting all the attributes) should go all the way and write actual XHTML with the actual XHTML content type and benefit from the strict parser catching potential errors that can backfire and that nobody would have noticed with the HTML 5 parser.

vbezhenar 15 hours ago

> why would you ever want to not close tags?

Because browsers close some tags automatically. And if your closing tag is wrong, it'll generate empty element instead of being ignored. Without even emitting warning in developer console. So by closing tags you're risking introducing very subtle DOM bugs.

If you want to close tags, make sure that your building or testing pipeline ensures strict validation of produced HTML.