summaryrefslogtreecommitdiffstats
path: root/man/man7/vdso.7
diff options
context:
space:
mode:
Diffstat (limited to 'man/man7/vdso.7')
-rw-r--r--man/man7/vdso.7612
1 files changed, 612 insertions, 0 deletions
diff --git a/man/man7/vdso.7 b/man/man7/vdso.7
new file mode 100644
index 0000000..2a9ae35
--- /dev/null
+++ b/man/man7/vdso.7
@@ -0,0 +1,612 @@
+'\" t
+.\" Written by Mike Frysinger <vapier@gentoo.org>
+.\"
+.\" %%%LICENSE_START(PUBLIC_DOMAIN)
+.\" This page is in the public domain.
+.\" %%%LICENSE_END
+.\"
+.\" Useful background:
+.\" http://articles.manugarg.com/systemcallinlinux2_6.html
+.\" https://lwn.net/Articles/446528/
+.\" http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
+.\" http://www.trilithium.com/johan/2005/08/linux-gate/
+.\"
+.TH vDSO 7 2024-05-02 "Linux man-pages (unreleased)"
+.SH NAME
+vdso \- overview of the virtual ELF dynamic shared object
+.SH SYNOPSIS
+.nf
+.B #include <sys/auxv.h>
+.P
+.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
+.fi
+.SH DESCRIPTION
+The "vDSO" (virtual dynamic shared object) is a small shared library that
+the kernel automatically maps into the
+address space of all user-space applications.
+Applications usually do not need to concern themselves with these details
+as the vDSO is most commonly called by the C library.
+This way you can code in the normal way using standard functions
+and the C library will take care
+of using any functionality that is available via the vDSO.
+.P
+Why does the vDSO exist at all?
+There are some system calls the kernel provides that
+user-space code ends up using frequently,
+to the point that such calls can dominate overall performance.
+This is due both to the frequency of the call as well as the
+context-switch overhead that results
+from exiting user space and entering the kernel.
+.P
+The rest of this documentation is geared toward the curious and/or
+C library writers rather than general developers.
+If you're trying to call the vDSO in your own application rather than using
+the C library, you're most likely doing it wrong.
+.SS Example background
+Making system calls can be slow.
+In x86 32-bit systems, you can trigger a software interrupt
+.RI ( "int $0x80" )
+to tell the kernel you wish to make a system call.
+However, this instruction is expensive: it goes through
+the full interrupt-handling paths
+in the processor's microcode as well as in the kernel.
+Newer processors have faster (but backward incompatible) instructions to
+initiate system calls.
+Rather than require the C library to figure out if this functionality is
+available at run time,
+the C library can use functions provided by the kernel in
+the vDSO.
+.P
+Note that the terminology can be confusing.
+On x86 systems, the vDSO function
+used to determine the preferred method of making a system call is
+named "__kernel_vsyscall", but on x86-64,
+the term "vsyscall" also refers to an obsolete way to ask the kernel
+what time it is or what CPU the caller is on.
+.P
+One frequently used system call is
+.BR gettimeofday (2).
+This system call is called both directly by user-space applications
+as well as indirectly by
+the C library.
+Think timestamps or timing loops or polling\[em]all of these
+frequently need to know what time it is right now.
+This information is also not secret\[em]any application in any
+privilege mode (root or any unprivileged user) will get the same answer.
+Thus the kernel arranges for the information required to answer
+this question to be placed in memory the process can access.
+Now a call to
+.BR gettimeofday (2)
+changes from a system call to a normal function
+call and a few memory accesses.
+.SS Finding the vDSO
+The base address of the vDSO (if one exists) is passed by the kernel to
+each program in the initial auxiliary vector (see
+.BR getauxval (3)),
+via the
+.B AT_SYSINFO_EHDR
+tag.
+.P
+You must not assume the vDSO is mapped at any particular location in the
+user's memory map.
+The base address will usually be randomized at run time every time a new
+process image is created (at
+.BR execve (2)
+time).
+This is done for security reasons,
+to prevent "return-to-libc" attacks.
+.P
+For some architectures, there is also an
+.B AT_SYSINFO
+tag.
+This is used only for locating the vsyscall entry point and is frequently
+omitted or set to 0 (meaning it's not available).
+This tag is a throwback to the initial vDSO work (see
+.I History
+below) and its use should be avoided.
+.SS File format
+Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
+This allows new symbols to be added with newer kernel releases,
+and allows the C library to detect available functionality at
+run time when running under different kernel versions.
+Oftentimes the C library will do detection with the first call and then
+cache the result for subsequent calls.
+.P
+All symbols are also versioned (using the GNU version format).
+This allows the kernel to update the function signature without breaking
+backward compatibility.
+This means changing the arguments that the function accepts as well as the
+return value.
+Thus, when looking up a symbol in the vDSO,
+you must always include the version
+to match the ABI you expect.
+.P
+Typically the vDSO follows the naming convention of prefixing
+all symbols with "__vdso_" or "__kernel_"
+so as to distinguish them from other standard symbols.
+For example, the "gettimeofday" function is named "__vdso_gettimeofday".
+.P
+You use the standard C calling conventions when calling
+any of these functions.
+No need to worry about weird register or stack behavior.
+.SH NOTES
+.SS Source
+When you compile the kernel,
+it will automatically compile and link the vDSO code for you.
+You will frequently find it under the architecture-specific directory:
+.P
+.in +4n
+.EX
+find arch/$ARCH/ \-name \[aq]*vdso*.so*\[aq] \-o \-name \[aq]*gate*.so*\[aq]
+.EE
+.in
+.\"
+.SS vDSO names
+The name of the vDSO varies across architectures.
+It will often show up in things like glibc's
+.BR ldd (1)
+output.
+The exact name should not matter to any code, so do not hardcode it.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+user ABI vDSO name
+_
+aarch64 linux\-vdso.so.1
+arm linux\-vdso.so.1
+ia64 linux\-gate.so.1
+mips linux\-vdso.so.1
+ppc/32 linux\-vdso32.so.1
+ppc/64 linux\-vdso64.so.1
+riscv linux\-vdso.so.1
+s390 linux\-vdso32.so.1
+s390x linux\-vdso64.so.1
+sh linux\-gate.so.1
+i386 linux\-gate.so.1
+x86-64 linux\-vdso.so.1
+x86/x32 linux\-vdso.so.1
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS strace(1), seccomp(2), and the vDSO
+When tracing system calls with
+.BR strace (1),
+symbols (system calls) that are exported by the vDSO will
+.I not
+appear in the trace output.
+Those system calls will likewise not be visible to
+.BR seccomp (2)
+filters.
+.SH ARCHITECTURE-SPECIFIC NOTES
+The subsections below provide architecture-specific notes
+on the vDSO.
+.P
+Note that the vDSO that is used is based on the ABI of your user-space code
+and not the ABI of the kernel.
+Thus, for example,
+when you run an i386 32-bit ELF binary,
+you'll get the same vDSO regardless of whether you run it under
+an i386 32-bit kernel or under an x86-64 64-bit kernel.
+Therefore, the name of the user-space ABI should be used to determine
+which of the sections below is relevant.
+.SS ARM functions
+.\" See linux/arch/arm/vdso/vdso.lds.S
+.\" Commit: 8512287a8165592466cb9cb347ba94892e9c56a5
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
+__vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.P
+.\" See linux/arch/arm/kernel/entry-armv.S
+.\" See linux/Documentation/arm/kernel_user_helpers.rst
+Additionally, the ARM port has a code page full of utility functions.
+Since it's just a raw page of code, there is no ELF information for doing
+symbol lookups or versioning.
+It does provide support for different versions though.
+.P
+For information on this code page,
+it's best to refer to the kernel documentation
+as it's extremely detailed and covers everything you need to know:
+.IR Documentation/arm/kernel_user_helpers.rst .
+.SS aarch64 functions
+.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_rt_sigreturn LINUX_2.6.39
+__kernel_gettimeofday LINUX_2.6.39
+__kernel_clock_gettime LINUX_2.6.39
+__kernel_clock_getres LINUX_2.6.39
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS bfin (Blackfin) functions (port removed in Linux 4.17)
+.\" See linux/arch/blackfin/kernel/fixed_code.S
+.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
+As this CPU lacks a memory management unit (MMU),
+it doesn't set up a vDSO in the normal sense.
+Instead, it maps at boot time a few raw functions into
+a fixed location in memory.
+User-space applications then call directly into that region.
+There is no provision for backward compatibility
+beyond sniffing raw opcodes,
+but as this is an embedded CPU, it can get away with things\[em]some of the
+object formats it runs aren't even ELF based (they're bFLT/FLAT).
+.P
+For information on this code page,
+it's best to refer to the public documentation:
+.br
+http://docs.blackfin.uclinux.org/doku.php?id=linux\-kernel:fixed\-code
+.SS mips functions
+.\" See linux/arch/mips/vdso/vdso.ld.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
+__kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS ia64 (Itanium) functions
+.\" See linux/arch/ia64/kernel/gate.lds.S
+.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.rst
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_sigtramp LINUX_2.5
+__kernel_syscall_via_break LINUX_2.5
+__kernel_syscall_via_epc LINUX_2.5
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.P
+The Itanium port is somewhat tricky.
+In addition to the vDSO above, it also has "light-weight system calls"
+(also known as "fast syscalls" or "fsys").
+You can invoke these via the
+.I __kernel_syscall_via_epc
+vDSO helper.
+The system calls listed here have the same semantics as if you called them
+directly via
+.BR syscall (2),
+so refer to the relevant
+documentation for each.
+The table below lists the functions available via this mechanism.
+.if t \{\
+.ft CW
+\}
+.TS
+l.
+function
+_
+clock_gettime
+getcpu
+getpid
+getppid
+gettimeofday
+set_tid_address
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS parisc (hppa) functions
+.\" See linux/arch/parisc/kernel/syscall.S
+.\" See linux/Documentation/parisc/registers.rst
+The parisc port has a code page with utility functions
+called a gateway page.
+Rather than use the normal ELF auxiliary vector approach,
+it passes the address of
+the page to the process via the SR2 register.
+The permissions on the page are such that merely executing those addresses
+automatically executes with kernel privileges and not in user space.
+This is done to match the way HP-UX works.
+.P
+Since it's just a raw page of code, there is no ELF information for doing
+symbol lookups or versioning.
+Simply call into the appropriate offset via the branch instruction,
+for example:
+.P
+.in +4n
+.EX
+ble <offset>(%sr2, %r0)
+.EE
+.in
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+offset function
+_
+00b0 lws_entry (CAS operations)
+00e0 set_thread_pointer (used by glibc)
+0100 linux_gateway_entry (syscall)
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS ppc/32 functions
+.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
+The table below lists the symbols exported by the vDSO.
+The functions marked with a
+.I *
+are available only when the kernel is
+a PowerPC64 (64-bit) kernel.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_clock_getres LINUX_2.6.15
+__kernel_clock_gettime LINUX_2.6.15
+__kernel_clock_gettime64 LINUX_5.11
+__kernel_datapage_offset LINUX_2.6.15
+__kernel_get_syscall_map LINUX_2.6.15
+__kernel_get_tbfreq LINUX_2.6.15
+__kernel_getcpu \fI*\fR LINUX_2.6.15
+__kernel_gettimeofday LINUX_2.6.15
+__kernel_sigtramp_rt32 LINUX_2.6.15
+__kernel_sigtramp32 LINUX_2.6.15
+__kernel_sync_dicache LINUX_2.6.15
+__kernel_sync_dicache_p5 LINUX_2.6.15
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.P
+Before Linux 5.6,
+.\" commit 654abc69ef2e69712e6d4e8a6cb9292b97a4aa39
+the
+.B CLOCK_REALTIME_COARSE
+and
+.B CLOCK_MONOTONIC_COARSE
+clocks are
+.I not
+supported by the
+.I __kernel_clock_getres
+and
+.I __kernel_clock_gettime
+interfaces;
+the kernel falls back to the real system call.
+.SS ppc/64 functions
+.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_clock_getres LINUX_2.6.15
+__kernel_clock_gettime LINUX_2.6.15
+__kernel_datapage_offset LINUX_2.6.15
+__kernel_get_syscall_map LINUX_2.6.15
+__kernel_get_tbfreq LINUX_2.6.15
+__kernel_getcpu LINUX_2.6.15
+__kernel_gettimeofday LINUX_2.6.15
+__kernel_sigtramp_rt64 LINUX_2.6.15
+__kernel_sync_dicache LINUX_2.6.15
+__kernel_sync_dicache_p5 LINUX_2.6.15
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.P
+Before Linux 4.16,
+.\" commit 5c929885f1bb4b77f85b1769c49405a0e0f154a1
+the
+.B CLOCK_REALTIME_COARSE
+and
+.B CLOCK_MONOTONIC_COARSE
+clocks are
+.I not
+supported by the
+.I __kernel_clock_getres
+and
+.I __kernel_clock_gettime
+interfaces;
+the kernel falls back to the real system call.
+.SS riscv functions
+.\" See linux/arch/riscv/kernel/vdso/vdso.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__vdso_rt_sigreturn LINUX_4.15
+__vdso_gettimeofday LINUX_4.15
+__vdso_clock_gettime LINUX_4.15
+__vdso_clock_getres LINUX_4.15
+__vdso_getcpu LINUX_4.15
+__vdso_flush_icache LINUX_4.15
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS s390 functions
+.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_clock_getres LINUX_2.6.29
+__kernel_clock_gettime LINUX_2.6.29
+__kernel_gettimeofday LINUX_2.6.29
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS s390x functions
+.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_clock_getres LINUX_2.6.29
+__kernel_clock_gettime LINUX_2.6.29
+__kernel_gettimeofday LINUX_2.6.29
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS sh (SuperH) functions
+.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_rt_sigreturn LINUX_2.6
+__kernel_sigreturn LINUX_2.6
+__kernel_vsyscall LINUX_2.6
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS i386 functions
+.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__kernel_sigreturn LINUX_2.5
+__kernel_rt_sigreturn LINUX_2.5
+__kernel_vsyscall LINUX_2.5
+.\" Added in 7a59ed415f5b57469e22e41fc4188d5399e0b194 and updated
+.\" in 37c975545ec63320789962bf307f000f08fabd48.
+__vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
+__vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
+__vdso_time LINUX_2.6 (exported since Linux 3.15)
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS x86-64 functions
+.\" See linux/arch/x86/vdso/vdso.lds.S
+The table below lists the symbols exported by the vDSO.
+All of these symbols are also available without the "__vdso_" prefix, but
+you should ignore those and stick to the names below.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__vdso_clock_gettime LINUX_2.6
+__vdso_getcpu LINUX_2.6
+__vdso_gettimeofday LINUX_2.6
+__vdso_time LINUX_2.6
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS x86/x32 functions
+.\" See linux/arch/x86/vdso/vdso32.lds.S
+The table below lists the symbols exported by the vDSO.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol version
+_
+__vdso_clock_gettime LINUX_2.6
+__vdso_getcpu LINUX_2.6
+__vdso_gettimeofday LINUX_2.6
+__vdso_time LINUX_2.6
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS History
+The vDSO was originally just a single function\[em]the vsyscall.
+In older kernels, you might see that name
+in a process's memory map rather than "vdso".
+Over time, people realized that this mechanism
+was a great way to pass more functionality
+to user space, so it was reconceived as a vDSO in the current format.
+.SH SEE ALSO
+.BR syscalls (2),
+.BR getauxval (3),
+.BR proc (5)
+.P
+The documents, examples, and source code in the Linux source code tree:
+.P
+.in +4n
+.EX
+Documentation/ABI/stable/vdso
+Documentation/ia64/fsys.rst
+Documentation/vDSO/* (includes examples of using the vDSO)
+.P
+find arch/ \-iname \[aq]*vdso*\[aq] \-o \-iname \[aq]*gate*\[aq]
+.EE
+.in