Comment by PKop

Comment by PKop 2 days ago

0 replies

Think of context switching when you yourself are programming. You can only hold some finite amount of concepts in your head at one time. If you have distractions, or try to focus on too many things at once, your ability to reason about your immediate problem degrades. Think also of legacy search engines: often, a more limited and focused search query vs a query that has too many terms, more precisely maps to your intended goal.

LLM's have always been at any time limited in the amount of tokens it can process at one time. This is increasing, but one problem is chat threads continually increase in size as you send messages back and forth because within any session or thread you are sending the full conversation to the LLM every message (aside from particular optimizations that compact or prune this). This also increases costs which are charged per token. Efficiency of cost and performance/precision/accuracy dictates using the context window judiciously.