Comment by fzzzy

Comment by fzzzy 2 days ago

I agree, 5 tokens per second is plenty fast for casual use.

overfeed 2 days ago

Also works perfectly fine in fire-and-forget, non-interactive agentic workflows. My dream scenario is that I create a bunch of kanban tickets and assign them to one or more AI personas[1], and wake up to some Pull Requests the next morning. I'd me more concerned about tickets-per-day, and not tk/s as I have no interest in watching the inner-workings of the model.

1. Some more creative than others, with slightly different injected prompts or perhaps even different models entirely.

Reply View 3 replies

numpad0 2 days ago

> I create a bunch of kanban tickets and assign them to one or more AI personas[1],
Yeah that. Why can't we just `find ./tasks/ | grep \.md$ | xargs llm`. Can't we just write up a government proposal style document, have LLM recursively down into sub-sub-projects and back up until the original proposal document can be translated into a completion report. Constantly correcting a humongous LLM with infinite context length that can keep everything in its head doesn't feel like the right approach.

Reply View | 2 replies
- londons_explore 2 days ago
  
  In my experience, this sort of thing nearly works... But never quite works well enough and errors and misunderstandings build at every stage and the output is garbage.
  Maybe with bigger models it'll work well.
  
  Reply View | 1 reply
  
  numpad0 12 hours ago
  
  I had hoped that this recursive breakdown approach could remove the need for bigger and bigger monolithic LLM for ever bigger tasks, by allowing every tasks to be at same granularity, but... I guess I should just try building one myself.
  
  Reply View | 0 replies

refulgentis 2 days ago

Cosign for chat, that's my bar for usable on mobile phone (and correlates well with avg. reading speed)

Reply View 0 replies

SV_BubbleTime a day ago

It was, last year 5tk/s was reasonable. If you wanted to proof read a paragraph or rewrite some bullet points into a PowerPoint slide.

Now, with agentic coding, thinking models, a “chat with my pdf” or whatever artifacts are being called now, no, I don’t think 5/s is enough.

Reply View 0 replies