Comment by viraptor
It's a joke. The image provided is a perfect base for pose ControlNet https://www.nextdiffusion.ai/tutorials/how-to-use-open-pose-...
It's likely easier to generate a matching photo these days than process some of the visual captchas. So it would be pointless to implement.
Not sure, but here in China many apps require you to perform a routine with your face, open mouth, blink etc.