Comment by andai

Comment by andai 10 months ago

Very nice. I made a thing in Python which summarizes a YouTube transcript in bullet points. Never thought about asking it questions, that's a great idea!

I just run yt-dlp to fetch the transcript and shove it in the GPT prompt. (I think also have a few lines to remove the timestamps, although arguably those would be useful to keep.)

My prompt is "{transcript} Please summarize the above in bullet points"

The trick was splitting it up into overlapping chunks so it fits in the context size. (And then summarizing your summary because it ends up too long cause you had so many chunks!)

These days that's not so important, usually you can shove an entire book in! (Unless you're using a local model, which still have small context sizes, work pretty well for summarization.)

shekhargulati 10 months ago

I also built something similar using yt-dlp and llm CLI and wrote a post about it https://shekhargulati.com/2024/07/30/building-a-youtube-vide.... Script here https://github.com/shekhargulati/llm-tools/blob/main/yt-summ...

Reply View 0 replies

potatoman22 10 months ago

Same! It's been a nifty little tool for helping me decide which videos are worth watching. https://github.com/davidhaas6/digest

Reply View 0 replies

HPsquared 10 months ago

If you're going as far as using yt-dlp, why not run the audio through Whisper?

Reply View 11 replies

andai 10 months ago

Interesting, I haven't used Whisper, is it cost effective? Seems to be about 36 cents per (hour long) video? How long does processing take?

Reply View | 6 replies
- kajecounterhack 10 months ago
  
  You can run it locally, and it's really fast. But since YouTube transcription is really good, I don't see why you'd use Whisper and get a worse transcription (unless maybe it's on videos that Google did not transcribe for whatever reason).
  
  Reply View | 3 replies
  
  gs17 10 months ago
  
  > But since YouTube transcription is really good
  Are you sure you're looking at automatic transcripts? YouTube transcripts are bizarrely low quality if they're not provided by the creators (I've actually used my Google Pixel's live transcription to make better captions occasionally).
  I just checked a video my girlfriend uploaded a week ago and the auto-transcript was still pretty messy. I've used Whisper for the same task and it's significantly better.
  
  Reply View | 2 replies
- HPsquared 10 months ago
  
  36 cents an hour is how much it costs to hire an entire GPU like an A4000. I can assure you Whisper runs much, much faster than 1x!
  
  Reply View | 1 reply
  
  jokethrowaway 10 months ago
  
  A few derivative projects are faster than 1x, insanely-fast-whisper being the fastest I've tried.
  whisper v3 large on release day was around 1x on a 4090
  
  Reply View | 0 replies
davidzweig 10 months ago

The security against downloading audio from YouTube has been upped recently with 'PO tokens'.
Whisper is only a few tenths of a cent per hour transcribed if transcribing on your gpu though, at about 30x real-time on a 3080 etc. with batching.

Reply View | 3 replies
- swyx 10 months ago
  
  > The security against downloading audio from YouTube has been upped recently with 'PO tokens'.
  do you have a source? more generally is there a community or news source for youtube "api" news like this?
  
  Reply View | 1 reply
  
  davidzweig 10 months ago
  
  I haven't been following closely the last few weeks, but you can check the issues in this repo, for example: https://github.com/distubejs/ytdl-core
  
  Reply View | 0 replies
- HPsquared 10 months ago
  
  Tbh I've not had trouble with this for personal use.
  
  Reply View | 0 replies