diff options
Diffstat (limited to 'man7/vdso.7')
-rw-r--r-- | man7/vdso.7 | 612 |
1 files changed, 0 insertions, 612 deletions
diff --git a/man7/vdso.7 b/man7/vdso.7 deleted file mode 100644 index d37360c..0000000 --- a/man7/vdso.7 +++ /dev/null @@ -1,612 +0,0 @@ -'\" t -.\" Written by Mike Frysinger <vapier@gentoo.org> -.\" -.\" %%%LICENSE_START(PUBLIC_DOMAIN) -.\" This page is in the public domain. -.\" %%%LICENSE_END -.\" -.\" Useful background: -.\" http://articles.manugarg.com/systemcallinlinux2_6.html -.\" https://lwn.net/Articles/446528/ -.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken -.\" http://www.trilithium.com/johan/2005/08/linux-gate/ -.\" -.TH vDSO 7 2024-02-18 "Linux man-pages 6.7" -.SH NAME -vdso \- overview of the virtual ELF dynamic shared object -.SH SYNOPSIS -.nf -.B #include <sys/auxv.h> -.P -.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR); -.fi -.SH DESCRIPTION -The "vDSO" (virtual dynamic shared object) is a small shared library that -the kernel automatically maps into the -address space of all user-space applications. -Applications usually do not need to concern themselves with these details -as the vDSO is most commonly called by the C library. -This way you can code in the normal way using standard functions -and the C library will take care -of using any functionality that is available via the vDSO. -.P -Why does the vDSO exist at all? -There are some system calls the kernel provides that -user-space code ends up using frequently, -to the point that such calls can dominate overall performance. -This is due both to the frequency of the call as well as the -context-switch overhead that results -from exiting user space and entering the kernel. -.P -The rest of this documentation is geared toward the curious and/or -C library writers rather than general developers. -If you're trying to call the vDSO in your own application rather than using -the C library, you're most likely doing it wrong. -.SS Example background -Making system calls can be slow. -In x86 32-bit systems, you can trigger a software interrupt -.RI ( "int $0x80" ) -to tell the kernel you wish to make a system call. -However, this instruction is expensive: it goes through -the full interrupt-handling paths -in the processor's microcode as well as in the kernel. -Newer processors have faster (but backward incompatible) instructions to -initiate system calls. -Rather than require the C library to figure out if this functionality is -available at run time, -the C library can use functions provided by the kernel in -the vDSO. -.P -Note that the terminology can be confusing. -On x86 systems, the vDSO function -used to determine the preferred method of making a system call is -named "__kernel_vsyscall", but on x86-64, -the term "vsyscall" also refers to an obsolete way to ask the kernel -what time it is or what CPU the caller is on. -.P -One frequently used system call is -.BR gettimeofday (2). -This system call is called both directly by user-space applications -as well as indirectly by -the C library. -Think timestamps or timing loops or polling\[em]all of these -frequently need to know what time it is right now. -This information is also not secret\[em]any application in any -privilege mode (root or any unprivileged user) will get the same answer. -Thus the kernel arranges for the information required to answer -this question to be placed in memory the process can access. -Now a call to -.BR gettimeofday (2) -changes from a system call to a normal function -call and a few memory accesses. -.SS Finding the vDSO -The base address of the vDSO (if one exists) is passed by the kernel to -each program in the initial auxiliary vector (see -.BR getauxval (3)), -via the -.B AT_SYSINFO_EHDR -tag. -.P -You must not assume the vDSO is mapped at any particular location in the -user's memory map. -The base address will usually be randomized at run time every time a new -process image is created (at -.BR execve (2) -time). -This is done for security reasons, -to prevent "return-to-libc" attacks. -.P -For some architectures, there is also an -.B AT_SYSINFO -tag. -This is used only for locating the vsyscall entry point and is frequently -omitted or set to 0 (meaning it's not available). -This tag is a throwback to the initial vDSO work (see -.I History -below) and its use should be avoided. -.SS File format -Since the vDSO is a fully formed ELF image, you can do symbol lookups on it. -This allows new symbols to be added with newer kernel releases, -and allows the C library to detect available functionality at -run time when running under different kernel versions. -Oftentimes the C library will do detection with the first call and then -cache the result for subsequent calls. -.P -All symbols are also versioned (using the GNU version format). -This allows the kernel to update the function signature without breaking -backward compatibility. -This means changing the arguments that the function accepts as well as the -return value. -Thus, when looking up a symbol in the vDSO, -you must always include the version -to match the ABI you expect. -.P -Typically the vDSO follows the naming convention of prefixing -all symbols with "__vdso_" or "__kernel_" -so as to distinguish them from other standard symbols. -For example, the "gettimeofday" function is named "__vdso_gettimeofday". -.P -You use the standard C calling conventions when calling -any of these functions. -No need to worry about weird register or stack behavior. -.SH NOTES -.SS Source -When you compile the kernel, -it will automatically compile and link the vDSO code for you. -You will frequently find it under the architecture-specific directory: -.P -.in +4n -.EX -find arch/$ARCH/ \-name \[aq]*vdso*.so*\[aq] \-o \-name \[aq]*gate*.so*\[aq] -.EE -.in -.\" -.SS vDSO names -The name of the vDSO varies across architectures. -It will often show up in things like glibc's -.BR ldd (1) -output. -The exact name should not matter to any code, so do not hardcode it. -.if t \{\ -.ft CW -\} -.TS -l l. -user ABI vDSO name -_ -aarch64 linux\-vdso.so.1 -arm linux\-vdso.so.1 -ia64 linux\-gate.so.1 -mips linux\-vdso.so.1 -ppc/32 linux\-vdso32.so.1 -ppc/64 linux\-vdso64.so.1 -riscv linux\-vdso.so.1 -s390 linux\-vdso32.so.1 -s390x linux\-vdso64.so.1 -sh linux\-gate.so.1 -i386 linux\-gate.so.1 -x86-64 linux\-vdso.so.1 -x86/x32 linux\-vdso.so.1 -.TE -.if t \{\ -.in -.ft P -\} -.SS strace(1), seccomp(2), and the vDSO -When tracing system calls with -.BR strace (1), -symbols (system calls) that are exported by the vDSO will -.I not -appear in the trace output. -Those system calls will likewise not be visible to -.BR seccomp (2) -filters. -.SH ARCHITECTURE-SPECIFIC NOTES -The subsections below provide architecture-specific notes -on the vDSO. -.P -Note that the vDSO that is used is based on the ABI of your user-space code -and not the ABI of the kernel. -Thus, for example, -when you run an i386 32-bit ELF binary, -you'll get the same vDSO regardless of whether you run it under -an i386 32-bit kernel or under an x86-64 64-bit kernel. -Therefore, the name of the user-space ABI should be used to determine -which of the sections below is relevant. -.SS ARM functions -.\" See linux/arch/arm/vdso/vdso.lds.S -.\" Commit: 8512287a8165592466cb9cb347ba94892e9c56a5 -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1) -__vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1) -.TE -.if t \{\ -.in -.ft P -\} -.P -.\" See linux/arch/arm/kernel/entry-armv.S -.\" See linux/Documentation/arm/kernel_user_helpers.rst -Additionally, the ARM port has a code page full of utility functions. -Since it's just a raw page of code, there is no ELF information for doing -symbol lookups or versioning. -It does provide support for different versions though. -.P -For information on this code page, -it's best to refer to the kernel documentation -as it's extremely detailed and covers everything you need to know: -.IR Documentation/arm/kernel_user_helpers.rst . -.SS aarch64 functions -.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_rt_sigreturn LINUX_2.6.39 -__kernel_gettimeofday LINUX_2.6.39 -__kernel_clock_gettime LINUX_2.6.39 -__kernel_clock_getres LINUX_2.6.39 -.TE -.if t \{\ -.in -.ft P -\} -.SS bfin (Blackfin) functions (port removed in Linux 4.17) -.\" See linux/arch/blackfin/kernel/fixed_code.S -.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code -As this CPU lacks a memory management unit (MMU), -it doesn't set up a vDSO in the normal sense. -Instead, it maps at boot time a few raw functions into -a fixed location in memory. -User-space applications then call directly into that region. -There is no provision for backward compatibility -beyond sniffing raw opcodes, -but as this is an embedded CPU, it can get away with things\[em]some of the -object formats it runs aren't even ELF based (they're bFLT/FLAT). -.P -For information on this code page, -it's best to refer to the public documentation: -.br -http://docs.blackfin.uclinux.org/doku.php?id=linux\-kernel:fixed\-code -.SS mips functions -.\" See linux/arch/mips/vdso/vdso.ld.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4) -__kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4) -.TE -.if t \{\ -.in -.ft P -\} -.SS ia64 (Itanium) functions -.\" See linux/arch/ia64/kernel/gate.lds.S -.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.rst -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_sigtramp LINUX_2.5 -__kernel_syscall_via_break LINUX_2.5 -__kernel_syscall_via_epc LINUX_2.5 -.TE -.if t \{\ -.in -.ft P -\} -.P -The Itanium port is somewhat tricky. -In addition to the vDSO above, it also has "light-weight system calls" -(also known as "fast syscalls" or "fsys"). -You can invoke these via the -.I __kernel_syscall_via_epc -vDSO helper. -The system calls listed here have the same semantics as if you called them -directly via -.BR syscall (2), -so refer to the relevant -documentation for each. -The table below lists the functions available via this mechanism. -.if t \{\ -.ft CW -\} -.TS -l. -function -_ -clock_gettime -getcpu -getpid -getppid -gettimeofday -set_tid_address -.TE -.if t \{\ -.in -.ft P -\} -.SS parisc (hppa) functions -.\" See linux/arch/parisc/kernel/syscall.S -.\" See linux/Documentation/parisc/registers.rst -The parisc port has a code page with utility functions -called a gateway page. -Rather than use the normal ELF auxiliary vector approach, -it passes the address of -the page to the process via the SR2 register. -The permissions on the page are such that merely executing those addresses -automatically executes with kernel privileges and not in user space. -This is done to match the way HP-UX works. -.P -Since it's just a raw page of code, there is no ELF information for doing -symbol lookups or versioning. -Simply call into the appropriate offset via the branch instruction, -for example: -.P -.in +4n -.EX -ble <offset>(%sr2, %r0) -.EE -.in -.if t \{\ -.ft CW -\} -.TS -l l. -offset function -_ -00b0 lws_entry (CAS operations) -00e0 set_thread_pointer (used by glibc) -0100 linux_gateway_entry (syscall) -.TE -.if t \{\ -.in -.ft P -\} -.SS ppc/32 functions -.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S -The table below lists the symbols exported by the vDSO. -The functions marked with a -.I * -are available only when the kernel is -a PowerPC64 (64-bit) kernel. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_clock_getres LINUX_2.6.15 -__kernel_clock_gettime LINUX_2.6.15 -__kernel_clock_gettime64 LINUX_5.11 -__kernel_datapage_offset LINUX_2.6.15 -__kernel_get_syscall_map LINUX_2.6.15 -__kernel_get_tbfreq LINUX_2.6.15 -__kernel_getcpu \fI*\fR LINUX_2.6.15 -__kernel_gettimeofday LINUX_2.6.15 -__kernel_sigtramp_rt32 LINUX_2.6.15 -__kernel_sigtramp32 LINUX_2.6.15 -__kernel_sync_dicache LINUX_2.6.15 -__kernel_sync_dicache_p5 LINUX_2.6.15 -.TE -.if t \{\ -.in -.ft P -\} -.P -Before Linux 5.6, -.\" commit 654abc69ef2e69712e6d4e8a6cb9292b97a4aa39 -the -.B CLOCK_REALTIME_COARSE -and -.B CLOCK_MONOTONIC_COARSE -clocks are -.I not -supported by the -.I __kernel_clock_getres -and -.I __kernel_clock_gettime -interfaces; -the kernel falls back to the real system call. -.SS ppc/64 functions -.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_clock_getres LINUX_2.6.15 -__kernel_clock_gettime LINUX_2.6.15 -__kernel_datapage_offset LINUX_2.6.15 -__kernel_get_syscall_map LINUX_2.6.15 -__kernel_get_tbfreq LINUX_2.6.15 -__kernel_getcpu LINUX_2.6.15 -__kernel_gettimeofday LINUX_2.6.15 -__kernel_sigtramp_rt64 LINUX_2.6.15 -__kernel_sync_dicache LINUX_2.6.15 -__kernel_sync_dicache_p5 LINUX_2.6.15 -.TE -.if t \{\ -.in -.ft P -\} -.P -Before Linux 4.16, -.\" commit 5c929885f1bb4b77f85b1769c49405a0e0f154a1 -the -.B CLOCK_REALTIME_COARSE -and -.B CLOCK_MONOTONIC_COARSE -clocks are -.I not -supported by the -.I __kernel_clock_getres -and -.I __kernel_clock_gettime -interfaces; -the kernel falls back to the real system call. -.SS riscv functions -.\" See linux/arch/riscv/kernel/vdso/vdso.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__vdso_rt_sigreturn LINUX_4.15 -__vdso_gettimeofday LINUX_4.15 -__vdso_clock_gettime LINUX_4.15 -__vdso_clock_getres LINUX_4.15 -__vdso_getcpu LINUX_4.15 -__vdso_flush_icache LINUX_4.15 -.TE -.if t \{\ -.in -.ft P -\} -.SS s390 functions -.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_clock_getres LINUX_2.6.29 -__kernel_clock_gettime LINUX_2.6.29 -__kernel_gettimeofday LINUX_2.6.29 -.TE -.if t \{\ -.in -.ft P -\} -.SS s390x functions -.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_clock_getres LINUX_2.6.29 -__kernel_clock_gettime LINUX_2.6.29 -__kernel_gettimeofday LINUX_2.6.29 -.TE -.if t \{\ -.in -.ft P -\} -.SS sh (SuperH) functions -.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_rt_sigreturn LINUX_2.6 -__kernel_sigreturn LINUX_2.6 -__kernel_vsyscall LINUX_2.6 -.TE -.if t \{\ -.in -.ft P -\} -.SS i386 functions -.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__kernel_sigreturn LINUX_2.5 -__kernel_rt_sigreturn LINUX_2.5 -__kernel_vsyscall LINUX_2.5 -.\" Added in 7a59ed415f5b57469e22e41fc4188d5399e0b194 and updated -.\" in 37c975545ec63320789962bf307f000f08fabd48. -__vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15) -__vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15) -__vdso_time LINUX_2.6 (exported since Linux 3.15) -.TE -.if t \{\ -.in -.ft P -\} -.SS x86-64 functions -.\" See linux/arch/x86/vdso/vdso.lds.S -The table below lists the symbols exported by the vDSO. -All of these symbols are also available without the "__vdso_" prefix, but -you should ignore those and stick to the names below. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__vdso_clock_gettime LINUX_2.6 -__vdso_getcpu LINUX_2.6 -__vdso_gettimeofday LINUX_2.6 -__vdso_time LINUX_2.6 -.TE -.if t \{\ -.in -.ft P -\} -.SS x86/x32 functions -.\" See linux/arch/x86/vdso/vdso32.lds.S -The table below lists the symbols exported by the vDSO. -.if t \{\ -.ft CW -\} -.TS -l l. -symbol version -_ -__vdso_clock_gettime LINUX_2.6 -__vdso_getcpu LINUX_2.6 -__vdso_gettimeofday LINUX_2.6 -__vdso_time LINUX_2.6 -.TE -.if t \{\ -.in -.ft P -\} -.SS History -The vDSO was originally just a single function\[em]the vsyscall. -In older kernels, you might see that name -in a process's memory map rather than "vdso". -Over time, people realized that this mechanism -was a great way to pass more functionality -to user space, so it was reconceived as a vDSO in the current format. -.SH SEE ALSO -.BR syscalls (2), -.BR getauxval (3), -.BR proc (5) -.P -The documents, examples, and source code in the Linux source code tree: -.P -.in +4n -.EX -Documentation/ABI/stable/vdso -Documentation/ia64/fsys.rst -Documentation/vDSO/* (includes examples of using the vDSO) -.P -find arch/ \-iname \[aq]*vdso*\[aq] \-o \-iname \[aq]*gate*\[aq] -.EE -.in |