Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
844 views
in Technique[技术] by (71.8m points)

assembly - OsDev syscall/sysret and sysenter/sysexit instructions enabling

I am building an 32 bit OS in assembly.
I have setup the IDT and I am handling program interruptus through int instruction.

How can I enable the syscall and sysenter instructions and how do I handle them/return?
Is true that syscall instruction isn't supported in 32 bit by Intel processors so I can't use it? Is true that sysret instruction isn't safe? Do somewhere exist a tutorial for that?

EDIT: My main question is how to enable the syscall and sysenter instructions! (No duplication)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

See the OSdev wiki for details on sysenter, including a note about how to avoid a security/safety problem. Also see the Intel / AMD manuals for that. They go into a lot of the detail that OS developers need. See the tag wiki for links.


Overview of the various system-call instructions:

  • int: available since forever (8086)
  • Trap by executing an invalid instruction, apparently was the fastest way to enter the kernel on 80386. (But that's not the case anymore).
  • call gate (i.e. a far call). See the OSdev link for details on that and traps.
  • sysenter: (http://wiki.osdev.org/Sysenter) Introduced by Intel before x86-64 existed, adopted by AMD not long after (many years ago). Available on all modern x86 CPUs. Very minimalist design, requires user-space cooperation for the kernel to be able to return, because it doesn't save EIP, ESP, or EFLAGS anywhere.

    Linux supports it in 32 and 64-bit kernels for system calls from 32-bit processes only. IDK if you could design a kernel that used it for 64-bit system calls as well / instead. (I know that wasn't the question, but it's related.)

    Using sysenter requires user-space cooperation to provide the return address and save its own ESP and EFLAGS. In Linux, the kernel exports a page of code which has the user-space side of this dance. User-space is expected to call this code instead of using sysenter directly, but feel free to design your OS however you want. Looking at Linux's code for both sides of this dance will probably be useful, if you don't find an example somewhere else.

  • syscall from 64-bit user-space: available everywhere because Intel implemented it along with the rest of AMD64. Well-designed interface that masks RFLAGS (with a configurable mask) before entering the kernel, so you can avoid a race window (if you had to disable interrupts manually with cli). Used with swapgs for the kernel to get access to its stack and so on.

    On mainstream x86 OSes (like Linux), syscall is the only way to make 64-bit system calls.

  • syscall from 32-bit user-space: A totally different instruction from long mode syscall, only available on AMD CPUs. The kernel-side interface is different for 32-bit kernels (legacy mode) vs. 64-bit kernels running 32-bit user-space (compat mode).

    The Linux kernel has some useful comments on it:

entry_64_compat.S 32-bit SYSCALL entry (32-bit syscall entry point into a 64-bit kernel)

 /* ...
 *  - Most programmers do not directly target AMD CPUs, and the 32-bit
 *    SYSCALL instruction does not exist on Intel CPUs.  Even on AMD
 *    CPUs, Linux disables the SYSCALL instruction on 32-bit kernels
 *    because the SYSCALL instruction in legacy/native 32-bit mode (as
 *    opposed to compat mode) is sufficiently poorly designed as to be
 *    essentially unusable.

Maybe a toy OS could use it without worrying about whatever problems make it unsuitable for Linux, IDK. But unless you're just plain curious, don't waste your time with it. OTOH, if you're interested in OS & CPU design, finding out what's wrong with the ISA design might be interesting.

BTW, when AMD was designing AMD64, they got some feedback from Linux kernel devs on the amd64 mailing list that improved the design of 64-bit syscall (to configurably mask RFLAGS) because their initial design would have been problematic for Linux. Links to those archived mailing list posts in this answer.


Recommendation: Use sysenter for your 32-bit kernel. It should be usable everywhere, including on AMD CPUs for years now. Ancient CPUs that don't support it can use the int 0x80 ABI (or whatever number you picked for your OS), if you want to add a 2nd compatibility ABI.

The Linux kernel entry points are well commented, and written fairly readably. While writing What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?, I had an easy time figuring out what was going on in the entry points into a 64-bit kernel using syscall (native 64-bit system calls), or int 0x80 or sysenter (32-bit system calls, normally from compat mode but int 0x80 is supported for 64-bit processes. But it still invokes the 32-bit ABI!) There's a bunch of complicated stuff going on in case various kinds of tracing / debugging are enabled, but the other parts are fairly easy to follow. See that answer for a walk-through of some of Linux's system-call handling internals.

In arch/x86/entry, these are the main files of interest:

  • entry_32.S: 32-bit kernel code for entry from user-space. (legacy mode)
  • entry_64_compat.S: 64-bit kernel code for entry from 32-bit user-space (compat mode -> long mode).
  • entry_64.S: 64-bit kernel code for entry from 64-bit user-space (long mode -> long mode).

You should be able to find Linux's VDSO code for the user-space side of the sysenter dance that passes the kernel the values it needs to return to user-space. (What is better "int 0x80" or "syscall"?). Related: What is better "int 0x80" or "syscall"?, and The Definitive Guide to Linux System Calls will give some useful info on the design choices Linux made.


Is true that sysret instruction isn't safe?

Intel and AMD both have separate bugs with non-canonical RIP when returning to 64-bit user-space. e.g. on Intel, Linux's entry_64.S describes it this way:

/*
 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
 * in kernel space.  This essentially lets the user take over
 * the kernel, since userspace controls RSP.

That can happen if a ptrace system call (e.g. made by a debugger) changed the saved value of the process's RIP to a non-canonical address. Linux checks whether it can use sysret, and if not uses its iret return path. (The sysret path is fast enough that it's worth doing extra work to check that it's safe).

Note that if a system call blocks / sleeps, the "master copy" of user-space's integer register state is on its kernel stack, where the system call entry point pushed it. (In Linux. Other designs are possible!) But anyway, this is why it's possible to end up with weird saved state that user-space couldn't have run syscall with (because it would have faulted on jmp to a non-canonical address), or with saved_rcx != saved_RIP (64-bit syscall sets RCX=RIP, and R11=RFLAGS (before masking), so it clobbers RCX and R11 but allows the kernel to restore RIP and RFLAGS.)

I don't know how 32-bit syscall works, sorry I got off topic here. But I suspect that what you may have read about sysret being unsafe was talking about 64-bit kernels.

IDK if there are any similar bugs in 32-bit-kernel sysret, or 64-bit-kernel sysret-to-compat-mode.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...