Comment by burntsushi
Comment by burntsushi 2 days ago
I always took these as "idealized" algorithms or pseudo-code. In addition to not doing correct unaligned loads (using `memcpy`), the boundary checks are wrong. The code assumes the length of the haystack is a multiple of 8. Additionally, if the needle is empty, you'll get unsigned integer overflow leading to an out-of-bounds access on the needle. That's three different ways to get UB.
This is pretty common in my experience when demonstrating SIMD code. Often only the relevant bits are shown, and not a lot of effort is given to getting the boundary cases correct. It's rarely acknowledged explicitly, but I would assume it's because the boundary conditions are somewhat less interesting than the "meat" of how to use vectors in an interesting way.
When I modified the idealized algorithm to suit my needs a few years ago, I learned just how painful the boundary cases are. I realized I didn't want to ever touch SIMD string searching unless:
- I had a very specific use-case that I really needed SIMD for and was willing to really invest a lot in it; or
- The work had already been done for me in a generic sense and I could reap the fruits of that labour (ripgrep / Rust Regex, thanks for that by the way)