TeMPOraL a day ago

If you walk through the N-gram database with a copy of Harry Potter in hand and observe that for N=7, you can find any piece of it in the database with above-average frequency, does that mean N-gram database is violating copyright?

  • gamblor956 18 hours ago

    If the database is sharing those pieces, it might be yes.

    Copyright takes into account the use for such the copying is done. Commercial use will almost always be treated as not fair use, with limited exceptions.

    • TeMPOraL 18 hours ago

      I'd say no, because you can't reasonably access and order those pieces without already having the work at your side to use as a reference.

gamblor956 18 hours ago

You are.

Copyright is quite literally about the right to control the creation and distribution of copies.

The creation of the unzipped file is not treated as a separate copy so the recipient would not be violating copyright just by unzipping the file you provided.