Comment by kloop
My team uses it for geospatial data. We rasterize slippy map tiles and then do a raster summary on the gpu.
It's a weird case, but the pixels can be processed independently for most of it, so it works pretty well. Then the rows can be summarized in parallel and rolled up at the end. The copy onto the gpu is our current bottleneck however.