Comment by adzm
These kinds of things are really interesting, and a good example of the importance of performance testing in optimized builds. I encountered something similar and realized the issue was different sizes of structures in debug vs release which ended up causing the CPU cache lines to overlap and invalidate. Though that sounds unlikely here.