You can multiply by 0x0101010101010101 to copy the lowest byte into all other bytes (assuming the rest were all zero to begin with), it's slightly annoying because there is no imul r64, r64, imm64
but you can could do this:
mov rax, 0x0101010101010101
imul rax, rdx ; at least as fast as mul rdx on all CPUs
If rdx
is not of the required form (in other words, if it has some extra bits set), just add a
movzx eax, dl
in front, and move the constant into RDX or another register. (movzx edx,dl
can't benefit from mov-elimination on Intel CPUs.)
If you don't like the code size (mov r64, imm64
is already 10 bytes by itself), just stick that constant in your data segment.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…