c++ - How do objects work in x86 at the assembly level?

Question

Welcome To Ask or Share your Answers For Others

c++ - How do objects work in x86 at the assembly level?

1 Answer

深蓝 · Answer 1 · 2021-10-16T22:32:19+0000

Classes are stored exactly the same way as structs, except when they have virtual members. In that case, there's an implicit vtable pointer as the first member (see below).

A struct is stored as a contiguous block of memory (if the compiler doesn't optimize it away or keep the member values in registers). Within a struct object, addresses of its elements increase in order in which the members were defined. (source: http://en.cppreference.com/w/c/language/struct). I linked the C definition, because in C++ struct means class (with public: as the default instead of private:).

Think of a struct or class as a block of bytes that might be too big to fit in a register, but which is copied around as a "value". Assembly language doesn't have a type system; bytes in memory are just bytes and it doesn't take any special instructions to store a double from a floating point register and reload it into an integer register. Or to do an unaligned load and get the last 3 bytes of 1 int and the first byte of the next. A struct is just part of building C's type system on top of blocks of memory, since blocks of memory are useful.

These blocks of bytes can have static (global or static), dynamic (malloc or new), or automatic storage (local variable: temporary on the stack or in registers, in normal C/C++ implementations on normal CPUs). The layout within a block is the same regardless (unless the compiler optimizes away the actual memory for a struct local variable; see the example below of inlining a function that returns a struct.)

A struct or class is the same as any other object. In C and C++ terminology, even an int is an object: http://en.cppreference.com/w/c/language/object. i.e. A contiguous block of bytes that you can memcpy around (except for non-POD types in C++).

The ABI rules for the system you're compiling for specify when and where padding is inserted to make sure each member has sufficient alignment even if you do something like struct { char a; int b; }; (for example, the x86-64 System V ABI, used on Linux and other non-Windows systems specifies that int is a 32-bit type that gets 4-byte alignment in memory. The ABI is what nails down some stuff that the C and C++ standards leave "implementation dependent", so that all compilers for that ABI can make code that can call each other's functions.)

Note that you can use offsetof(struct_name, member) to find out about struct layout (in C11 and C++11). See also alignof in C++11, or _Alignof in C11.

It's up to the programmer to order struct members well to avoid wasting space on padding, since C rules don't let the compiler sort your struct for you. (e.g. if you have some char members, put them in groups of at least 4, rather than alternating with wider members. Sorting from large to small is an easy rule, remembering that pointers may be 64 or 32-bit on common platforms.)

More details of ABIs and so on can be found at https://stackoverflow.com/tags/x86/info. Agner Fog's excellent site includes an ABI guide, along with optimization guides.

Classes (with member functions)

class foo {
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  int inc_b(void);
};

int foo::inc_b(void) { return m_b++; }

compiles to (using http://gcc.godbolt.org/):

foo::inc_b():                  # args: this in RDI
    mov eax, DWORD PTR [rdi+4]      # eax = this->m_b
    lea edx, [rax+1]                # edx = eax+1
    mov DWORD PTR [rdi+4], edx      # this->m_b = edx
    ret

As you can see, the this pointer is passed as an implicit first argument (in rdi, in the SysV AMD64 ABI). m_b is stored at 4 bytes from the start of the struct/class. Note the clever use of lea to implement the post-increment operator, leaving the old value in eax.

No code for inc_a is emitted, since it's defined inside the class declaration. It's treated the same as an inline non-member function. If it was really big and the compiler decided not to inline it, it could emit a stand-alone version of it.

Where C++ objects really differ from C structs is when virtual member functions are involved. Each copy of the object has to carry around an extra pointer (to the vtable for its actual type).

class foo {
  public:
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  void inc_b(void);
  virtual void inc_v(void);
};

void foo::inc_b(void) { m_b++; }

class bar: public foo {
 public:
  virtual void inc_v(void);  // overrides foo::inc_v even for users that access it through a pointer to class foo
};

void foo::inc_v(void) { m_b++; }
void bar::inc_v(void) { m_a++; }

compiles to

  ; This time I made the functions return void, so the asm is simpler
  ; The in-memory layout of the class is now:
  ;   vtable ptr (8B)
  ;   m_a (4B)
  ;   m_b (4B)
foo::inc_v():
    add DWORD PTR [rdi+12], 1   # this_2(D)->m_b,
    ret
bar::inc_v():
    add DWORD PTR [rdi+8], 1    # this_2(D)->D.2657.m_a,
    ret

    # if you uncheck the hide-directives box, you'll see
    .globl  foo::inc_b()
    .set    foo::inc_b(),foo::inc_v()
    # since inc_b has the same definition as foo's inc_v, so gcc saves space by making one an alias for the other.

    # you can also see the directives that define the data that goes in the vtables

Fun fact: add m32, imm8 is faster than inc m32 on most Intel CPUs (micro-fusion of the load+ALU uops); one of the rare cases where the old Pentium4 advice to avoid inc still applies. gcc always avoids inc, though, even when it would save code size with no downsides :/ INC instruction vs ADD 1: Does it matter?

Virtual function dispatch:

void caller(foo *p){
    p->inc_v();
}

    mov     rax, QWORD PTR [rdi]      # p_2(D)->_vptr.foo, p_2(D)->_vptr.foo
    jmp     [QWORD PTR [rax]]         # *_3

(This is an optimized tailcall: jmp replacing call/ret).

The mov loads the vtable address from the object into a register. The jmp is a memory-indirect jump, i.e. loading a new RIP value from memory. The jump-target address is vtable[0], i.e. the first function pointer in the vtable. If there was another virtual function, the mov wouldn't change but the jmp would use jmp [rax + 8].

The order of entries in the vtable presumably matches the order of declaration in the class, so reordering the class declaration in one translation unit would result in virtual functions going to the wrong target. Just like reordering the data members would change the class's ABI.

If the compiler had more information, it could devirtualize the call. e.g. if it could prove that the foo * was always pointing to a bar object, it could inline bar::inc_v().

GCC will even speculatively devirtualize when it can figure out what the type probably is at compile time. In the above code, the compiler can't see any classes that inherit from bar, so it's a good bet that bar* is pointing to a bar object, rather than some derived class.

void caller_bar(bar *p){
    p->inc_v();
}

# gcc5.5 -O3
caller_bar(bar*):
    mov     rax, QWORD PTR [rdi]      # load vtable pointer
    mov     rax, QWORD PTR [rax]      # load target function address
    cmp     rax, OFFSET FLAT:bar::inc_v()  # check it
    jne     .L6       #,
    add     DWORD PTR [rdi+8], 1      # inlined version of bar::inc_v()
    ret
.L6:
    jmp     rax               # otherwise tailcall the derived class's function

Remember, a foo * can actually point to a derived bar object, but a bar * is not allowed to point to a pure foo object.

It is just a bet though; part of the point of virtual functions is that types can be extended without recompiling all the code that operates on the base type. This is why it has to compare the function

Categories

c++ - How do objects work in x86 at the assembly level?