1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
|
.\" Man page generated from reStructuredText.
.
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "BTRFS-QUOTA" "8" "Mar 26, 2024" "6.8" "BTRFS"
.SH NAME
btrfs-quota \- control the global quota status of a btrfs filesystem
.SH SYNOPSIS
.sp
\fBbtrfs quota\fP <subcommand> <args>
.SH DESCRIPTION
.sp
The commands under \fBbtrfs quota\fP are used to affect the global status of quotas
of a btrfs filesystem. The quota groups (qgroups) are managed by the subcommand
\fI\%btrfs\-qgroup(8)\fP\&.
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
Qgroups are different than the traditional user quotas and designed
to track shared and exclusive data per\-subvolume. Please refer to the section
\fI\%HIERARCHICAL QUOTA GROUP CONCEPTS\fP
for a detailed description.
.UNINDENT
.UNINDENT
.SS PERFORMANCE IMPLICATIONS
.sp
When quotas are activated, they affect all extent processing, which takes a
performance hit. Activation of qgroups is not recommended unless the user
intends to actually use them.
.SS STABILITY STATUS
.sp
The qgroup implementation has turned out to be quite difficult as it affects
the core of the filesystem operation. Qgroup users have hit various corner cases
over time, such as incorrect accounting or system instability. The situation is
gradually improving and issues found and fixed.
.SH HIERARCHICAL QUOTA GROUP CONCEPTS
.sp
The concept of quota has a long\-standing tradition in the Unix world. Ever
since computers allow multiple users to work simultaneously in one filesystem,
there is the need to prevent one user from using up the entire space. Every
user should get his fair share of the available resources.
.sp
In case of files, the solution is quite straightforward. Each file has an
\fIowner\fP recorded along with it, and it has a size. Traditional quota just
restricts the total size of all files that are owned by a user. The concept is
quite flexible: if a user hits his quota limit, the administrator can raise it
on the fly.
.sp
On the other hand, the traditional approach has only a poor solution to
restrict directories.
At installation time, the harddisk can be partitioned so that every directory
(e.g. \fB/usr\fP, \fB/var\fP, ...) that needs a limit gets its own partition. The obvious
problem is that those limits cannot be changed without a reinstallation. The
btrfs subvolume feature builds a bridge. Subvolumes correspond in many ways to
partitions, as every subvolume looks like its own filesystem. With subvolume
quota, it is now possible to restrict each subvolume like a partition, but keep
the flexibility of quota. The space for each subvolume can be expanded or
restricted on the fly.
.sp
As subvolumes are the basis for snapshots, interesting questions arise as to
how to account used space in the presence of snapshots. If you have a file
shared between a subvolume and a snapshot, whom to account the file to? The
creator? Both? What if the file gets modified in the snapshot, should only
these changes be accounted to it? But wait, both the snapshot and the subvolume
belong to the same user home. I just want to limit the total space used by
both! But somebody else might not want to charge the snapshots to the users.
.sp
Btrfs subvolume quota solves these problems by introducing groups of subvolumes
and let the user put limits on them. It is even possible to have groups of
groups. In the following, we refer to them as \fIqgroups\fP\&.
.sp
Each qgroup primarily tracks two numbers, the amount of total referenced
space and the amount of exclusively referenced space.
.INDENT 0.0
.TP
.B referenced
space is the amount of data that can be reached from any of the
subvolumes contained in the qgroup, while
.TP
.B exclusive
is the amount of data where all references to this data can be reached
from within this qgroup.
.UNINDENT
.SS Subvolume quota groups
.sp
The basic notion of the Subvolume Quota feature is the quota group, short
qgroup. Qgroups are notated as \fIlevel/id\fP, e.g. the qgroup 3/2 is a qgroup of
level 3. For level 0, the leading \fI0/\fP can be omitted.
Qgroups of level 0 get created automatically when a subvolume/snapshot gets
created. The ID of the qgroup corresponds to the ID of the subvolume, so 0/5
is the qgroup for the root subvolume.
For the \fBbtrfs qgroup\fP command, the path to the subvolume can also be used
instead of \fI0/ID\fP\&. For all higher levels, the ID can be chosen freely.
.sp
Each qgroup can contain a set of lower level qgroups, thus creating a hierarchy
of qgroups. Figure 1 shows an example qgroup tree.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
+\-\-\-+
|2/1|
+\-\-\-+
/ \e
+\-\-\-+/ \e+\-\-\-+
|1/1| |1/2|
+\-\-\-+ +\-\-\-+
/ \e / \e
+\-\-\-+/ \e+\-\-\-+/ \e+\-\-\-+
qgroups |0/1| |0/2| |0/3|
+\-+\-+ +\-\-\-+ +\-\-\-+
| / \e / \e
| / \e / \e
| / \e / \e
extents 1 2 3 4
Figure 1: Sample qgroup hierarchy
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
At the bottom, some extents are depicted showing which qgroups reference which
extents. It is important to understand the notion of \fIreferenced\fP vs
\fIexclusive\fP\&. In the example, qgroup 0/2 references extents 2 and 3, while 1/2
references extents 2\-4, 2/1 references all extents.
.sp
On the other hand, extent 1 is exclusive to 0/1, extent 2 is exclusive to 0/2,
while extent 3 is neither exclusive to 0/2 nor to 0/3. But because both
references can be reached from 1/2, extent 3 is exclusive to 1/2. All extents
are exclusive to 2/1.
.sp
So exclusive does not mean there is no other way to reach the extent, but it
does mean that if you delete all subvolumes contained in a qgroup, the extent
will get deleted.
.sp
Exclusive of a qgroup conveys the useful information how much space will be
freed in case all subvolumes of the qgroup get deleted.
.sp
All data extents are accounted this way. Metadata that belongs to a specific
subvolume (i.e. its filesystem tree) is also accounted. Checksums and extent
allocation information are not accounted.
.sp
In turn, the referenced count of a qgroup can be limited. All writes beyond
this limit will lead to a \(aqQuota Exceeded\(aq error.
.SS Inheritance
.sp
Things get a bit more complicated when new subvolumes or snapshots are created.
The case of (empty) subvolumes is still quite easy. If a subvolume should be
part of a qgroup, it has to be added to the qgroup at creation time. To add it
at a later time, it would be necessary to at least rescan the full subvolume
for a proper accounting.
.sp
Creation of a snapshot is the hard case. Obviously, the snapshot will
reference the exact amount of space as its source, and both source and
destination now have an exclusive count of 0 (the filesystem nodesize to be
precise, as the roots of the trees are not shared). But what about qgroups of
higher levels? If the qgroup contains both the source and the destination,
nothing changes. If the qgroup contains only the source, it might lose some
exclusive.
.sp
But how much? The tempting answer is, subtract all exclusive of the source from
the qgroup, but that is wrong, or at least not enough. There could have been
an extent that is referenced from the source and another subvolume from that
qgroup. This extent would have been exclusive to the qgroup, but not to the
source subvolume. With the creation of the snapshot, the qgroup would also
lose this extent from its exclusive set.
.sp
So how can this problem be solved? In the instant the snapshot gets created, we
already have to know the correct exclusive count. We need to have a second
qgroup that contains all the subvolumes as the first qgroup, except the
subvolume we want to snapshot. The moment we create the snapshot, the
exclusive count from the second qgroup needs to be copied to the first qgroup,
as it represents the correct value. The second qgroup is called a tracking
qgroup. It is only there in case a snapshot is needed.
.SS Use cases
.sp
Below are some use cases that do not mean to be extensive. You can find your
own way how to integrate qgroups.
.SS Single\-user machine
.sp
\fBReplacement for partitions.\fP
The simplest use case is to use qgroups as simple replacement for partitions.
Btrfs takes the disk as a whole, and \fB/\fP, \fB/usr\fP, \fB/var\fP, etc. are created as
subvolumes. As each subvolume gets it own qgroup automatically, they can
simply be restricted. No hierarchy is needed for that.
.sp
\fBTrack usage of snapshots.\fP
When a snapshot is taken, a qgroup for it will automatically be created with
the correct values. \fIReferenced\fP will show how much is in it, possibly shared
with other subvolumes. \fIExclusive\fP will be the amount of space that gets freed
when the subvolume is deleted.
.SS Multi\-user machine
.sp
\fBRestricting homes.\fP
When you have several users on a machine, with home directories probably under
\fB/home\fP, you might want to restrict \fB/home\fP as a whole, while restricting every
user to an individual limit as well. This is easily accomplished by creating a
qgroup for \fB/home\fP , e.g. 1/1, and assigning all user subvolumes to it.
Restricting this qgroup will limit /home, while every user subvolume can get
its own (lower) limit.
.sp
\fBAccounting snapshots to the user.\fP
Let\(aqs say the user is allowed to create snapshots via some mechanism. It would
only be fair to account space used by the snapshots to the user. This does not
mean the user doubles his usage as soon as he takes a snapshot. Of course,
files that are present in his home and the snapshot should only be accounted
once. This can be accomplished by creating a qgroup for each user, say
\fI1/UID\fP\&. The user home and all snapshots are assigned to this qgroup.
Limiting it will extend the limit to all snapshots, counting files only once.
To limit \fB/home\fP as a whole, a higher level group 2/1 replacing 1/1 from the
previous example is needed, with all user qgroups assigned to it.
.sp
\fBDo not account snapshots.\fP
On the other hand, when the snapshots get created automatically, the user has
no chance to control them, so the space used by them should not be accounted to
him. This is already the case when creating snapshots in the example from
the previous section.
.sp
\fBSnapshots for backup purposes.\fP
This scenario is a mixture of the previous two. The user can create snapshots,
but some snapshots for backup purposes are being created by the system. The
user\(aqs snapshots should be accounted to the user, not the system. The solution
is similar to the one from section \fIAccounting snapshots to the user\fP, but do
not assign system snapshots to user\(aqs qgroup.
.SS Simple quotas (squota)
.sp
As detailed in this document, qgroups can handle many complex extent sharing
and unsharing scenarios while maintaining an accurate count of exclusive and
shared usage. However, this flexibility comes at a cost: many of the
computations are global, in the sense that we must count up the number of trees
referring to an extent after its references change. This can slow down
transaction commits and lead to unacceptable latencies, especially in cases
where snapshots scale up.
.sp
To work around this limitation of qgroups, btrfs also supports a second set of
quota semantics: \fIsimple quotas\fP or \fIsquotas\fP\&. Squotas fully share the qgroups API
and hierarchical model, but do not track shared vs. exclusive usage. Instead,
they account all extents to the subvolume that first allocated it. With a bit
of new bookkeeping, this allows all accounting decisions to be local to the
allocation or freeing operation that deals with the extents themselves, and
fully avoids the complex and costly back\-reference resolutions.
.sp
\fBExample\fP
.sp
To illustrate the difference between squotas and qgroups, consider the following
basic example assuming a nodesize of 16KiB.
.INDENT 0.0
.IP 1. 3
create subvolume 256
.IP 2. 3
rack up 1GiB of data and metadata usage in 256
.IP 3. 3
snapshot 256, creating subvolume 257
.IP 4. 3
COW 512MiB of the data and metadata in 257
.IP 5. 3
delete everything in 256
.UNINDENT
.sp
At each step, qgroups would have the following accounting:
.INDENT 0.0
.IP 1. 3
0/256: 16KiB excl 0 shared
.IP 2. 3
0/256: 1GiB excl 0 shared
.IP 3. 3
0/256: 0 excl 1GiB shared; 0/257: 0 excl 1GiB shared
.IP 4. 3
0/256: 512MiB excl 512MiB shared; 0/257: 512MiB excl 512MiB shared
.IP 5. 3
0/256: 16KiB excl 0 shared; 0/257: 1GiB excl 0 shared
.UNINDENT
.sp
Whereas under squotas, the accounting would look like:
.INDENT 0.0
.IP 1. 3
0/256: 16KiB excl 16KiB shared
.IP 2. 3
0/256: 1GiB excl 1GiB shared
.IP 3. 3
0/256: 1GiB excl 1GiB shared; 0/257: 16KiB excl 16KiB shared
.IP 4. 3
0/256: 1GiB excl 1GiB shared; 0/257: 512MiB excl 512MiB shared
.IP 5. 3
0/256: 512MiB excl 512MiB shared; 0/257: 512MiB excl 512MiB shared
.UNINDENT
.sp
Note that since the original snapshotted 512MiB are still referenced by 257,
they cannot be freed from 256, even after 256 is emptied, or even deleted.
.sp
\fBSummary\fP
.sp
If you want some of power and flexibility of quotas for tracking and limiting
subvolume usage, but want to avoid the performance penalty of accurately
tracking extent ownership life cycles, then squotas can be a useful option.
.sp
Furthermore, squotas is targeted at use cases where the original extent is
immutable, like image snapshotting for container startup, in which case we avoid
these awkward scenarios where a subvolume is empty or deleted but still has
significant extents accounted to it. However, as long as you are aware of the
accounting semantics, they can handle mutable original extents.
.SH SUBCOMMAND
.INDENT 0.0
.TP
.B disable <path>
Disable subvolume quota support for a filesystem.
.TP
.B enable [options] <path>
Enable subvolume quota support for a filesystem. At this point it\(aqs
possible the two modes of accounting. The \fIfull\fP means that extent
ownership by subvolumes will be tracked all the time, \fIsimple\fP will
account everything to the first owner. See the section for more details.
.sp
\fBOptions\fP
.INDENT 7.0
.TP
.B \-s|\-\-simple
use simple quotas (squotas) instead of full qgroup accounting
.UNINDENT
.TP
.B rescan [options] <path>
Trash all qgroup numbers and scan the metadata again with the current config.
.sp
\fBOptions\fP
.INDENT 7.0
.TP
.B \-s|\-\-status
show status of a running rescan operation.
.TP
.B \-w|\-\-wait
start rescan and wait for it to finish (can be already in progress)
.TP
.B \-W|\-\-wait\-norescan
wait for rescan to finish without starting it
.UNINDENT
.UNINDENT
.SH EXIT STATUS
.sp
\fBbtrfs quota\fP returns a zero exit status if it succeeds. Non zero is
returned in case of failure.
.SH AVAILABILITY
.sp
\fBbtrfs\fP is part of btrfs\-progs. Please refer to the documentation at
\fI\%https://btrfs.readthedocs.io\fP\&.
.SH SEE ALSO
.sp
\fI\%btrfs\-qgroup(8)\fP,
\fI\%btrfs\-subvolume(8)\fP,
\fI\%mkfs.btrfs(8)\fP
.\" Generated by docutils manpage writer.
.
|