Comment by andriamanitra

Comment by andriamanitra 11 hours ago

0 replies

> For example, there is nothing wrong with an programming language that only allows ASCII in the source code and many downsides to allowing non-ASCII characters outside string constants or comments.

That's a tradeoff you should carefully consider because there are also downsides to disallowing non-ASCII characters. The downsides of allowing non-ASCII mostly stem from assigning semantic significance to upper/lowercase (which is itself a tradeoff you should consider when designing a language). The other issue I can think of is homographs but it seems to be more of a theoretical concern than a problem you'd run into in practice.

When I first learned programming I used my native language (Finnish, which uses 3 non-ASCII letters: åäö) not only for strings and comments but also identifiers. Back then UTF-8 was not yet universally adopted (ISO 8859-1 character set was still relatively common) so I occasionally encountered issues that I had no means to understand at the time. As programming is being taught to younger and younger audiences it's not reasonable to expect kids from (insert your favorite non-English speaking country) to know enough English to use it for naming. Naming and, to an extent, thinking in English requires a vocabulary orders of magnitude larger than knowing the keywords.

By restricting source code to ASCII only you also lose the ability to use domain-specific notation like mathematical symbols/operators and Greek letters. For example in Julia you may use some mathematical operators (eg. ÷ for Euclidean division, ⊻ for exclusive or, ∈/∉/∋ for checking set membership) and I find it really makes code more pleasant to read.