Windows sends the debugger a specific set of events, you can find them in the documentation of WaitForDebugEvent.
One of these events is CREATE_THREAD_DEBUG_INFO
, which is sent when Windows has created but not yet started the thread.
In Windows, process and thread creation happens in the kernel but their final initialization steps happen in userspace (unless it's a picoprocess, which we won't address here). The DLL ntdll.dll
is mapped in the thread just after it's been created and the thread context's RIP
is set to point to one of this DLL's functions.
This function will perform the necessary initializations and then jump to the address given in CreateThread
or similar. This function is kind of a wrapper for threads.
It is quite granted that thread start happens when the first instruction of the initialization function is about to execute (think of it as if Windows had set a breakpoint there).
The thread entry is, instead, just the address given to the thread creation API. It is important because it is the actual code the caller intended to be executed. In fact, for debugging or RE purposes, you can almost (if not always) ignore the thread start event.
Let's do an example. Consider this simple 64-bit program.
BITS 64
EXTERN CreateThread
GLOBAL _start
SECTION .text
_start:
and rsp, -16
push 0
push 0
sub rsp, 20h
xor r9, r9
lea r8, [REL _thread1]
xor edx, edx
xor ecx, ecx
call CreateThread
.loop:
TIMES 1000 pause
jmp .loop
_thread1:
TIMES 1000 pause
jmp _thread1
All it does is create a thread pointing to a sled of pause
instructions executed in a loop. The main thread will also execute a similar, but different, loop.
The purpose of the loop is to have the RIP
of the threads change but still not being inside a Windows API. Any instruction in the loop, granted it doesn't fault, will be fine. I picked pause
, because :)
Assemble and link the program.
Open x64dbg, open the program, and then set the Thread start and Thread entry events.
Now press F9 to reach the program entry point and press F9 again to let it go. The debugger will be notified of the new thread creation.
Note that the execution stopped at the beginning of RtlUserThreadStart
. This is always the case for my version of Windows (Windows 7 something). It makes sense, given the introduction at the beginning of this answer.
Also note that the thread entry point is in rcx
, meaning it is the first parameter for RtlUserThreadStart
.
Now, this was the event that Windows sent to the debugger, so it's natural the execution stopped here.
But the thread entry event doesn't exist, what is x64dbg doing here?
You can unveil this mystery by looking at the breakpoint tab.
You see that the debugger set a one-time (i.e. it will be removed automatically by the debugger itself) breakpoint at the thread entry point.
So, while Windows doesn't offer support for generating a debug event when a thread first starts executing its user code, a debugger can emulate it easily by putting a breakpoint there before the thread actually start.
Note that this means the debugger always react to the thread start events, when disabled in the options it will simply not stop, show and wait for you to do something.
Pausing and resuming the thread doesn't change the thread entry point, which is fixed at thread creation.
x64dbg has a threads tab that allows the user to suspend and resume the threads. Playing with it doesn't change the thread entry point, just the RIP
s that still point somewhere in the two loops (that exists for easing this test).
If the thread is created with the suspend flag, the thread start event won't fire until the thread is resumed.
But if, before resuming the thread, a pair of calls to Get/SetThreadContext
is done to change the thread's RIP
, then RtlUserStartThread
will never be executed (IDK what this function does exactly, but a thread can do without it) and the thread start event will never fire.
The execution will go straight to the altered RIP
.
I'm not sure if this is a legacy bug of Windows' debugging interface, the thread start event could be generated by setting the TF
before the first schedule of the thread (and immediately removing it upon catching the relevant exception).
When debugging/REing thread, what I usually do is putting a breakpoint in the thread entry point (which is easy to get) or in the hijacked RIP
(which is also easy to get, since this kind of threads are created suspended, so you know something is fishy).
If the program is being nasty and the code at the thread's RIP
is not yet in clear (e.g. is still obfuscated), use a hardware breakpoint.
Note This same whole thing happens for process creation too, exactly the same (only with the PE entry point instead of a thread entry point).