Comment by yreg

Comment by yreg 3 days ago

1 reply

You could ask it for a couple of verbatim sentences from the transcript that are most related to what you are interested in, then find the timestamp for that text. (There could be UI for this.)

Another solution would be to skip the LLM prompting part altogether and

1. break the transcript into short sections

2. create embeddings from them and remember the timestamps for each

3. embed your query (what are you interested in)

4. calculate the closest embedding in the transcript to your query

5. return the original timestamp

ofou 2 days ago

That's a good idea. However, I believe the challenging part lies in first reconstructing the short utterances into coherent, meaningful paragraphs.

Currently, with the API [1], you can retrieve a JSON with timestamps. The main issue, though, is how to parse the text effectively into meaningful sentences, and then add the timestamps at the beginning of the paragraph. WIP.

[1]: https://textube.olivares.cl/watch?v=9iqn1HhFJ6c&format=JSON