Comment by dougg3
Author here. Thanks! I believe the register reads are just extending the delay, although the new approach does have a side effect of reading from the hardware multiple times. I don't think the multiple reads really matter though.
I went with the multiple reads because that's what Marvell's own kernel fork does. My reasoning was that people have been using their fork, not only on the PXA168, but on the newer PXAxxxx series, so it would be best to retain Marvell's approach. I could have just increased the delay loop, but I didn't have any way of knowing if the delay I chose would be correct on newer PXAxxx models as well, like the chip used in the OLPC. Really wish they had more/better documentation!