A C++ reference to an array is the same as a pointer to the first element, in assembly language.
Even C99 int foo(int arr[static 3])
is still just a pointer in asm. The static
syntax guarantees to the compiler that it can safely read all 3 elements even if the C abstract machine doesn't access some elements, so for example it could use a branchless cmov
for an if
.
The caller doesn't pass a length in a register because it's a compile-time constant and thus not needed at run-time.
You can pass arrays by value, but only if they're inside a struct or union. In that case, different calling conventions have different rules. What kind of C11 data type is an array according to the AMD64 ABI.
You'd almost never want to pass an array by value, so it makes sense that C doesn't have syntax for it, and that C++ never invented any either. Passing by constant reference (i.e. const int *arr
) is far more efficient; just a single pointer arg.
Removing compiler noise by enabling optimization:
I put your code on the Godbolt compiler explorer, compiled with gcc -O3 -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions
to stop it from inlining the function calls. That gets rid of all the noise from -O0
debug-build and frame-pointer boilerplate. (I just searched the man page for inline
and disabled inlining options until I got what I wanted.)
Instead of -fno-inline-small-functions
and so on, you could use GNU C __attribute__((noinline))
on your function definitions to disable inlining for specific functions, even if they're static
.
I also added a call to a function without a definition, so the compiler needs to have arr[]
with the right values in memory, and added a store to arr[4]
in two of the functions. This lets us test whether the compiler warns about going outside the array bounds.
__attribute__((noinline, noclone))
void foo_p(int*arr) {(void)arr;}
void foo_r(int(&arr)[3]) {arr[4] = 41;}
template<int length>
void foo_t(int(&arr)[length]) {arr[4] = 42;}
void usearg(int*); // stop main from optimizing away arr[] if foo_... inline
int main()
{
int arr[] = {1, 2, 3};
foo_p(arr);
foo_r(arr);
foo_t(arr);
usearg(arr);
return 0;
}
gcc7.3 -O3 -Wall -Wextra
without function inlining, on Godbolt: Since I silenced the unused-args warnings from your code, the only warning we get is from the template, not from foo_r
:
<source>: In function 'int main()':
<source>:14:10: warning: array subscript is above array bounds [-Warray-bounds]
foo_t(arr);
~~~~~^~~~~
The asm output is:
void foo_t<3>(int (&) [3]) [clone .isra.0]:
mov DWORD PTR [rdi], 42 # *ISRA.3_4(D),
ret
foo_p(int*):
rep ret
foo_r(int (&) [3]):
mov DWORD PTR [rdi+16], 41 # *arr_2(D),
ret
main:
sub rsp, 24 # reserve space for the array and align the stack for calls
movabs rax, 8589934593 # this is 0x200000001: the first 2 elems
lea rdi, [rsp+4]
mov QWORD PTR [rsp+4], rax # MEM[(int *)&arr], first 2 elements
mov DWORD PTR [rsp+12], 3 # MEM[(int *)&arr + 8B], 3rd element as an imm32
call foo_r(int (&) [3])
lea rdi, [rsp+20]
call void foo_t<3>(int (&) [3]) [clone .isra.0] #
lea rdi, [rsp+4] # tmp97,
call usearg(int*) #
xor eax, eax #
add rsp, 24 #,
ret
The call to foo_p()
still got optimized away, probably because it doesn't do anything. (I didn't disable inter-procedural optimization, and even the noinline
and noclone
attributes didn't stop that.) Adding *arr=0;
to the function body results in a call to it from main
(passing a pointer in rdi
just like the other 2).
Notice the clone .isra.0
annotation on the demangled function name: gcc made a definition of the function that takes a pointer to arr[4]
rather than to the base element. That's why there's a lea rdi, [rsp+20]
to set up the arg, and why the store uses [rdi]
to deref the point with no displacement. __attribute__((noclone))
would stop that.
This inter-procedural optimization is pretty much trivial and saves 1 byte of code size in this case (just the disp8
in the addressing mode in the clone), but can be useful in other cases. The caller needs to know that its a definition for a modified version of the function, like void foo_clone(int *p) { *p = 42; }
, which is why it needs to encode that in the mangled symbol name.
If you'd instantiated the template in one file and called it from another file that couldn't see the definition, then without link-time optimization gcc would have to just call the regular name and pass a pointer to the array like the function as written.
IDK why gcc does this for the template but not the reference. It might be related to the fact it warns about the template version, but not the reference version. Or maybe it's related to main
deducing the template?
BTW, an IPO that would actually make it run slightly faster would be to let main
use mov rdi, rsp
instead of lea rdi, [rsp+4]
. i.e. take &arr[-1]
as the function arg, so the clone would use mov dword ptr [rdi+20], 42
.
But that's only helpful for callers like main
that have allocated an array 4 bytes above rsp
, and I think gcc is only looking for IPOs that make the function itself more efficient, not the calling sequence in one specific caller.