Comment by nodja

Comment by nodja 14 days ago

1 reply

Loading here refers to loading from VRAM to the GPUs core cache, loading from VRAM is extremely slow in terms of GPU time that GPU cores end up idle most of the time just waiting for more data to come in.

frabcus 10 days ago

Thanks, got it! Think I need a deeper article on this - as comment below says you'd then need to load the request specific state in instead.