1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
|
.\" Copyright (c) 2010 Michael Kerrisk <mtk.manpages@gmail.com>
.\" based on a proposal from Stephan Mueller <smueller@atsec.com>
.\"
.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.\"
.\" Various pieces of text taken from the kernel source and the commentary
.\" in Linux commit fa28237cfcc5827553044cbd6ee52e33692b0faa
.\" both written by Paul Mackerras <paulus@samba.org>
.\"
.TH subpage_prot 2 2024-05-02 "Linux man-pages (unreleased)"
.SH NAME
subpage_prot \- define a subpage protection for an address range
.SH LIBRARY
Standard C library
.RI ( libc ", " \-lc )
.SH SYNOPSIS
.nf
.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
.B #include <unistd.h>
.P
.BI "int syscall(SYS_subpage_prot, unsigned long " addr ", unsigned long " len ,
.BI " uint32_t *" map );
.fi
.P
.IR Note :
glibc provides no wrapper for
.BR subpage_prot (),
necessitating the use of
.BR syscall (2).
.SH DESCRIPTION
The PowerPC-specific
.BR subpage_prot ()
system call provides the facility to control the access
permissions on individual 4\ kB subpages on systems configured with
a page size of 64\ kB.
.P
The protection map is applied to the memory pages in the region starting at
.I addr
and continuing for
.I len
bytes.
Both of these arguments must be aligned to a 64-kB boundary.
.P
The protection map is specified in the buffer pointed to by
.IR map .
The map has 2 bits per 4\ kB subpage;
thus each 32-bit word specifies the protections of 16 4\ kB subpages
inside a 64\ kB page
(so, the number of 32-bit words pointed to by
.I map
should equate to the number of 64-kB pages specified by
.IR len ).
Each 2-bit field in the protection map is either 0 to allow any access,
1 to prevent writes, or 2 or 3 to prevent all accesses.
.SH RETURN VALUE
On success,
.BR subpage_prot ()
returns 0.
Otherwise, one of the error codes specified below is returned.
.SH ERRORS
.TP
.B EFAULT
The buffer referred to by
.I map
is not accessible.
.TP
.B EINVAL
The
.I addr
or
.I len
arguments are incorrect.
Both of these arguments must be aligned to a multiple of the system page size,
and they must not refer to a region outside of the
address space of the process or to a region that consists of huge pages.
.TP
.B ENOMEM
Out of memory.
.SH STANDARDS
Linux.
.SH HISTORY
Linux 2.6.25 (PowerPC).
.P
The system call is provided only if the kernel is configured with
.BR CONFIG_PPC_64K_PAGES .
.SH NOTES
Normal page protections (at the 64-kB page level) also apply;
the subpage protection mechanism is an additional constraint,
so putting 0 in a 2-bit field won't allow writes to a page that is otherwise
write-protected.
.SS Rationale
This system call is provided to assist writing emulators that
operate using 64-kB pages on PowerPC systems.
When emulating systems such as x86, which uses a smaller page size,
the emulator can no longer use the memory-management unit (MMU)
and normal system calls for controlling page protections.
(The emulator could emulate the MMU by checking and possibly remapping
the address for each memory access in software, but that is slow.)
The idea is that the emulator supplies an array of protection masks
to apply to a specified range of virtual addresses.
These masks are applied at the level where hardware page-table entries (PTEs)
are inserted into the hardware page table based on the Linux PTEs,
so the Linux PTEs are not affected.
Implicit in this is that the regions of the address space that are
protected are switched to use 4-kB hardware pages rather than 64-kB
hardware pages (on machines with hardware 64-kB page support).
.\" In the initial implementation, it was the case that:
.\" In fact the whole process is switched to use 4 kB hardware pages when the
.\" subpage_prot system call is used, but this could be improved in future
.\" to switch only the affected segments.
.\" But Paul Mackerass says (Oct 2010): I'm pretty sure we now only switch
.\" the affected segment, not the whole process.
.SH SEE ALSO
.BR mprotect (2),
.BR syscall (2)
.P
.I Documentation/admin\-guide/mm/hugetlbpage.rst
in the Linux kernel source tree
|