stefan_ 2 days ago

Well if you were to read the link you might just find out! Today is your chance to be less dumb than the model!

  • bonoboTP 2 days ago

    I checked the link, it never says that the model's prediction get lower quality due to batching, just nondeterministic. I don't understand why people conflate these things. Also it's unlikely that they use smaller batch sizes when load is lower. They just likely spin up and down GPU serves based on demand, or more likely, reallocate servers and gpus between different roles and tasks.