Comment by lvl155
I am building something similar to Grammarly as a personal project but quickly realized how hard it is to get data in 2024. Contemplating whether I should just resort to pirated data which is just sad.
I am building something similar to Grammarly as a personal project but quickly realized how hard it is to get data in 2024. Contemplating whether I should just resort to pirated data which is just sad.
I’m just going to remind everyone that all these LLMs were also trained on not just pirated, but all out stolen data in organized and resourced assaults on proprietary information/data, not even to mention roughshod ignoring any and all licenses.
To be fair, OpenAI used pirated data