I'm currently working with some local legacy code, so I primarily wanted to scan for incorrectly transcoded accented characters (central-european to utf-8 mishaps) - did find them.
Also good against data fingerprinting, homoglyph attacks in links (e.g. in comments), pranks (greek question mark vs. semicolon), or if it's a strictly international codebase, checking for anything outside ASCII. So when you don't really trust a codebase and want to establish a baseline, basically.
But I also included other features, like checking line ending consistency, line indentation consistency, line lengths, POSIX compliance, and encoding validity. Line lengths were of particular interest to me, having seen some malicious PRs recently to FOSS projects where the attacker would just move the payload out of sight to the side, expecting most people to have word wrap off and just not even notice (pretty funny tbf).
I'm currently working with some local legacy code, so I primarily wanted to scan for incorrectly transcoded accented characters (central-european to utf-8 mishaps) - did find them.
Also good against data fingerprinting, homoglyph attacks in links (e.g. in comments), pranks (greek question mark vs. semicolon), or if it's a strictly international codebase, checking for anything outside ASCII. So when you don't really trust a codebase and want to establish a baseline, basically.
But I also included other features, like checking line ending consistency, line indentation consistency, line lengths, POSIX compliance, and encoding validity. Line lengths were of particular interest to me, having seen some malicious PRs recently to FOSS projects where the attacker would just move the payload out of sight to the side, expecting most people to have word wrap off and just not even notice (pretty funny tbf).