Comment by data-ottawa
Comment by data-ottawa 19 hours ago
With gpt5 did you try adjusting the reasoning level to "minimal"?
I tried using it for a very small and quick summarization task that needed low latency and any level above that took several seconds to get a response. Using minimal brought that down significantly.
Weirdly gpt5's reasoning levels don't map to the OpenAI api level reasoning effort levels.
Reasoning was set to minimal and low (and I think I tried medium at some point). I do not believe the timeouts were due to the reasoning taking to long, although I never streamed the results. I think the model just fails often. It stops producing tokens and eventually the request times out.