Comment by veltas
I'm not them but whenever I've used it it's been for arch specific features like adding a debug breakpoint, synchronization, using system registers, etc.
Never for performance. If I wanted to hand optimise code I'd be more likely to use SIMD intrinsics, play with C until the compiler does the right thing, or write the entire function in a separate asm file for better highlighting and easier handing of state at ABI boundary rather than mid-function like the carry flags mentioned above.
Generally inline assembly is much easier these days as a) the compiler can see into it and make optimizations b) you don’t have to worry about calling conventions