related: Handling calls to (potentially) far away ahead-of-time compiled functions from JITed code has more about JITing, especially allocating your JIT buffer near the code it wants to call, so you can use efficient call rel32
. Or what to do if not.
Also Call an absolute pointer in x86 machine code is a good canonical Q&A about call
or jmp
to an absolute address.
TL:DR: To call a function by name, just use call func
like a normal person and let the assembler + linker take care of it. Since you say you're using NASM, I guess you're actually generating the machine code with an assembler. It sounded like a more complicated question, but I think you were just trying to ask if the normal way was safe.
Indirect call r/m64
(FF /2
) takes a 64-bit register or memory operand in 64-bit mode.
So you can do
func equ 0x123456789ab
; or if func is a regular label
mov rax, func ; mov r64, imm64, or mov r32, imm32 if it fits
call rax
Normally you'd put a label address into a register with lea rax, [rel func]
, but if that's encodeable then you'd just use call rel32
.
Or, if you know what address your machine code will be stored in, you can use the normal direct call rel32
encoding, after you calculate the difference in address from the target to the end of the call
instruction.
If you don't want to use an indirect call, then the rel32
encoding is your only option. Make sure your machine code goes into the low 2GiB so it can reach any address in the low 4GiB.
if I could be guaranteed that the executable would be for sure mapped to the bottom 4GB of memory
Yes, this is the default code model for Linux, Windows, and OS X. AMD64 call / jump instructions, and RIP-relative addressing, only use rel32
encodings, so all systems default to the "small" code model where code and static data are in the low 2GiB, so it's guaranteed that the linker can just fill in a rel32 to reach up to 2G forward or 2G backward.
The x86-64 System V ABI does discuss Large / Huge code models, but IDK if anyone ever uses that, because of the inefficiency of addressing data and making calls.
re: efficiency: yes, mov
/ call rax
is less efficient. I think it's significantly slower if branch prediction misses and can't provide a target prediction from the BTB. However, even call rel32
and jmp rel32
still need the BTB for full performance. See Slow jmp-instruction for experimental results from relative jmp next_insn
slowing down when there are too many in a giant loop.
With hot branch predictors, the indirect version is only extra code size and an extra uop (the mov
). It might consume more prediction resources, but maybe not even that.
See also What branch misprediction does the Branch Target Buffer detect?