Comment by sdenton4

It strikes me that neutral network inference loads are probably pretty resilient to these kinds of problems (as we see the bits per activation steadily decreasing), and where they aren't, you can add them as augmentations at training time and they will essentially act as regularization.