Comment by henrikf
That's Leela Zero (plays Go instead of Chess). It was good for its time (~2018) but it's quite outdated now. It also uses OpenCL instead of Cuda. I wrote a lot of that code including Winograd convolution routines.
Leela Chess Zero (https://github.com/LeelaChessZero/lc0) has much more optimized Cuda backend targeting modern GPU architectures and it's written by much more knowledgeable people than me. That would be a much better source to learn.