Comment by eigenvalue

Yes, I'm also working on another version that is document-centric. It's a bit of a different problem. In the case of YouTube video transcripts, we are dealing with raw speech utterances. There could be run-on sentences, filler words and other speech errors, etc. Basically, it's a very far cry from a polished written document. Thus we need to really transform the underlying content to first get the optimized document, which can differ quite significantly from the raw transcript. Then we use that optimized document to generate the quizzes.

In the case of a document only workflow, we generally want to stick to what's in the document very closely, and just extract the text accurately using OCR if needed (or extract it directly in case we don't need OCR) and then reformat it into nice looking markdown-- but without changing the actual content itself, just its appearance. When we've turned the original document into nice looking markdown, we can then use this to generate the quizzes and perhaps other related outputs (e.g, Anki cards, Powerpoint-type presentation slides, etc.).

Because of that fundamental difference in approach, I decided to separate it into two different apps. But I'm planning on using much of the same UI and other backend structure. The document centric app also seems like it has a broader base of potential users (like teachers-- there are a lot of teachers out there, way more than there are YouTube content creators). I started with the YouTube app because my wife makes YouTube videos about music theory and I wanted to make something that at least she would actually want to use!