Comment by develatio

Comment by develatio 13 hours ago

7 replies

I was not able to understand why these code points are bad. The post states that they are bad, but why? Any examples? Any actual situations and PoC that might help me understand how will that break "my code"?

orangeboats 13 hours ago

Sometimes it's not just "your code". Strings are often interchanged and sent to many other parties.

And some of the codepoints, such as the surrogate codepoints (which MUST come in pairs in properly encoded UTF-16), may not break your code but break poorly-written spaghetti-ridden UTF-16-based hellholes that do not expect unpaired surrogates.

Something like:

1. You send a UTF-8 string containing normal characters and an unpaired surrogate: "Hello /uDEADworld" to FooApp.

2. FooApp converts the UTF-8 string to UTF-16 and saves it in a file. All without validation, so no crashes will actually occur; worst case scenario, the unpaired surrogate is rendered by the frontend as "�".

3. Next time, when it reads the file again, this time it is expecting normal UTF-16, and it crashes because of the unpaired surrogate.

(A more fatal failure mode of (3) is out-of-bounds memory read if the unpaired surrogate happens at the end of string)

  • pixl97 4 hours ago

    I had a github action with a phrase 'filter: \directory\u02filename.txt' or something close to this and the the filename got interpreted as a utf-8 character rather than a string literal causing the application to throw an error about invalid utf 8 in the path. Had to go about setting it up to quote the strings differently, but you get to see a lot of these issues in the wild.

JimDabell 13 hours ago

Suppose, when you were registering your username `develatio`, you decided to put U+202E RIGHT-TO-LEFT OVERRIDE in there as well. Now when somebody is reading this page and their browser gets to your username, it switches the text direction to render it right-to-left.

  • develatio 13 hours ago

    and "that's it"? I mean, it does sound like it might introduce unexpected UI behaviour, but are there any other more serious / dangerous consequences?

    • JimDabell 13 hours ago

      Making any page that mentions you – including admin pages that might be used to disable your account – become unreadable is bad enough.

      Another comment linked to this:

      https://trojansource.codes

    • yencabulator 12 hours ago

      One of my pet peeves is when UIs don't clearly constrain and delineate the extent of user-controlled text. Plenty of phishing attacks have relied on having attacker-controlled input seem authoritative, e.g. getting gmail to repeat back something to the victim.

    • LikesPwsh 8 hours ago

      RTL lets you obfuscate file extensions.

      E.g. Annexe.txt (that you might assume would be safely opened by a text editor) could actually be Ann\u202Etxt.exe, a dangerous executable.