Comment by ewild
the original chunk is most likely stored with it in referential format such as an id in the metadata to pull from a DB or something along those lines. I do exactly what he does aswell and i have an Id metadata value that does exactly that pointing to an id in a DB which holds the text chunks and their respective metadata
The original chunk, sure, but what if the original chunk is full of eg pronouns? This is a problem I haven't heard an elegant solution for, although I've seen it done OK.
What I mean is, how can you derive topics from a chunk that refers to them only obliquely?