Comment by choraria

Comment by choraria 7 hours ago

6 replies

I thought about this and decided against complicating ways in which this can be restricted. Honestly, this is a super simple challenge to solve. Perhaps I should introduce this as an API parameter to detect variations. That way, not just taylor_swift but t_aylorswift, ta_ylorswift etc. could also be detected and flagged.

As for realtaylorswift, I thought about that too. I don't think — and this is my personal opinion, obviously — most platforms wouldn't want to restrict this because then it really becomes unmanageable. I could obviously be wrong though and these could very easily be introduced to the API also (i.e. detect obvious username patterns) and totally open to adding that as an API parameter too.

chaps 7 hours ago

Friend, with respect, these "simple challenge"s really start to add up very quickly, especially after edge cases.

Highly recommend you read this and similar posts: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

  • choraria 7 hours ago

    Damn! Just read the title and a few lines from the post but will definitely go through it fully and thoroughly. Thanks for sharing.

    I didn't mean to reduce the complexity of the challenge. Was mostly trying to convey that the specific cases being discussed, should be something that I could quickly solution and incorporate in the API.

    You're right about ALL the different kinds of edge cases that exist though and really, I'm trying to have this API be the go-to solution for it. Clearly, it's still not there. But it will be. I'm now more sure than ever.

  • gs17 7 hours ago

    > I can safely assume that this dictionary of bad words contains no people’s names in it.

    This is a big one for this kind of project, and I've never been sure how usernames for people named Kike should be handled.

    • choraria 6 hours ago

      Good point. Currently, I've got "kike" as a Spanish dictionary word and also a public figure. Honestly, the job of this API stops there. It tells the platform that this username needs to be handled differently than "randomusername7346783" which has absolutely no value. Now, what we do with this info is really up to admins/platform owners. They could simply do nothing, flag and monitor, charge a premium or block outright. Totally their call but they can now programmatically decide that.

      • gs17 6 hours ago

        It definitely should be in a list of offensive terms too (and offensive dictionaries by language could be even more useful, telling moderators why it was flagged is valuable).

        • choraria 6 hours ago

          I see. Will re-run through the categories and the datasets from which I've adopted the names and categories. Maybe either I missed something or it might've not existed in the import in the first place. But noted. Also, thanks :)