Comment by bpt3
Yes, a classifier based on similarity metrics would be more useful than whatever is going on behind the scenes here, which seems to be completely based on string matching and a not very creative dictionary of offensive terms.
Yes, a classifier based on similarity metrics would be more useful than whatever is going on behind the scenes here, which seems to be completely based on string matching and a not very creative dictionary of offensive terms.
Interesting! Didn't think about it that way. Currently, it's a super dumb system. There's a list of ~1.7 million records and the API simply looks-up against that. Super lazy approach. Was avoid running an API across OpenAI or other model but didn't think about hosting a classifier/LLM myself. Might consider it in the future.
Full disclosure: I'm not a developer. I understand tech architectures well. Can code (have coded in JS pre-AI too) BUT will figure this out as I go along. Thanks and truly appreciate the input.
Edit note: added million next to 1.7. fml!