HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by venusenvy47

Comment by venusenvy47 15 days ago

0 replies

View on Hacker News

The big players use parallel processing of multiple users to keep the GPUs and memory filled as much as possible during the inference they are providing to users. They can make use of the fact that they have a fairly steady stream of requests coming into their data centers at all times. This article describes some of how this is accomplished.

https://www.infracloud.io/blogs/inference-parallelism/