1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
|
commit e4a541624df62ef353e754391cbbb707f54b16f7
Author: Gary Lowell <gary.lowell@inktank.com>
Date: Mon Jan 7 13:33:30 2013 -0800
v0.56.1
commit 9aecacda7fbf07f12b210f87cf3dbb53021b068d
Author: Sage Weil <sage@inktank.com>
Date: Sun Jan 6 08:38:27 2013 -0800
msg/Pipe: prepare Message data for wire under pipe_lock
We cannot trust the Message bufferlists or other structures to be
stable without pipe_lock, as another Pipe may claim and modify the sent
list items while we are writing to the socket.
Related to #3678.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d16ad9263d7b1d3c096f56c56e9631fae8509651)
commit 299dbad490df5e98c04f17fa8e486a718f3c121f
Author: Sage Weil <sage@inktank.com>
Date: Sun Jan 6 08:33:01 2013 -0800
msgr: update Message envelope in encode, not write_message
Fill out the Message header, footer, and calculate CRCs during
encoding, not write_message(). This removes most modifications from
Pipe::write_message().
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 40706afc66f485b2bd40b2b4b1cd5377244f8758)
commit 35d2f58305eab6c9b57a92269598b9729e2d8681
Author: Sage Weil <sage@inktank.com>
Date: Sun Jan 6 08:25:40 2013 -0800
msg/Pipe: encode message inside pipe_lock
This modifies bufferlists in the Message struct, and it is possible
for multiple instances of the Pipe to get references on the Message;
make sure they don't modify those bufferlists concurrently.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4cfc4903c6fb130b6ac9105baf1f66fbda797f14)
commit 9b23f195df43589d062da95a11abc07c79f3109b
Author: Sage Weil <sage@inktank.com>
Date: Sat Jan 5 10:39:08 2013 -0800
msg/Pipe: associate sending msgs to con inside lock
Associate a sending message with the connection inside the pipe_lock.
This way if a racing thread tries to steal these messages it will
be sure to reset the con point *after* we do such that it the con
pointer is valid in encode_payload() (and later).
This may be part of #3678.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a058f16113efa8f32eb5503d5443aa139754d479)
commit 6229b5a06f449a470d3211ea94c1c5faf7100876
Author: Sage Weil <sage@inktank.com>
Date: Sat Jan 5 09:29:50 2013 -0800
msg/Pipe: fix msg leak in requeue_sent()
The sent list owns a reference to each message.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2a1eb466d3f8e25ec8906b3ca6118a14c4e269d2)
commit 6a00ce0dc24626fdfa210ddec6334bde3c8a20db
Author: Sage Weil <sage@inktank.com>
Date: Mon Jan 7 12:58:39 2013 -0800
osdc/Objecter: fix linger_ops iterator invalidation on pool deletion
The call to check_linger_pool_dne() may unregister the linger request,
invalidating the iterator. To avoid this, increment the iterator at
the top of the loop.
This mirror the fix in 4bf9078286d58c2cd4e85cb8b31411220a377092 for
regular non-linger ops.
Fixes: #3734
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 62586884afd56f2148205bdadc5a67037a750a9b)
commit a10950f91e6ba9c1620d8fd00a84fc59f983fcee
Author: Sage Weil <sage@inktank.com>
Date: Sat Jan 5 20:53:49 2013 -0800
os/FileJournal: include limits.h
Needed for IOV_MAX.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ce49968938ca3636f48fe543111aa219f36914d8)
commit cd194ef3c7082993cae0892a97494f2a917ce2a7
Author: Sage Weil <sage@inktank.com>
Date: Fri Jan 4 17:43:41 2013 -0800
osd: special case CALL op to not have RD bit effects
In commit 20496b8d2b2c3779a771695c6f778abbdb66d92a we treat a CALL as
different from a normal "read", but we did not adjust the behavior
determined by the RD bit in the op. We tried to fix that in
91e941aef9f55425cc12204146f26d79c444cfae, but changing the op code breaks
compatibility, so that was reverted.
Instead, special-case CALL in the helper--the only point in the code that
actually checks for the RD bit. (And fix one lingering user to use that
helper appropriately.)
Fixes: #3731
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 988a52173522e9a410ba975a4e8b7c25c7801123)
commit 921e06decebccc913c0e4f61916d00e62e7e1635
Author: Sage Weil <sage@inktank.com>
Date: Fri Jan 4 20:46:48 2013 -0800
Revert "OSD: remove RD flag from CALL ops"
This reverts commit 91e941aef9f55425cc12204146f26d79c444cfae.
We cannot change this op code without breaking compatibility
with old code (client and server). We'll have to special case
this op code instead.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit d3abd0fe0bb402ff403259d4b1a718a56331fc39)
commit 7513e9719a532dc538d838f68e47c83cc51fef82
Author: Samuel Just <sam.just@inktank.com>
Date: Fri Jan 4 12:43:52 2013 -0800
ReplicatedPG: remove old-head optization from push_to_replica
This optimization allowed the primary to push a clone as a single push in the
case that the head object on the replica is old and happens to be at the same
version as the clone. In general, using head in clone_subsets is tricky since
we might be writing to head during the push. calc_clone_subsets does not
consider head (probably for this reason). Handling the clone from head case
properly would require blocking writes on head in the interim which is probably
a bad trade off anyway.
Because the old-head optimization only comes into play if the replica's state
happens to fall on the last write to head prior to the snap that caused the
clone in question, it's not worth the complexity.
Fixes: #3698
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e89b6ade63cdad315ab754789de24008cfe42b37)
commit c63c66463a567e8095711e7c853ac8feb065c5c5
Author: Sage Weil <sage@inktank.com>
Date: Thu Jan 3 17:15:07 2013 -0800
os/FileStore: fix non-btrfs op_seq commit order
The op_seq file is the starting point for journal replay. For stable btrfs
commit mode, which is using a snapshot as a reference, we should write this
file before we take the snap. We normally ignore current/ contents anyway.
On non-btrfs file systems, however, we should only write this file *after*
we do a full sync, and we should then fsync(2) it before we continue
(and potentially trim anything from the journal).
This fixes a serious bug that could cause data loss and corruption after
a power loss event. For a 'kill -9' or crash, however, there was little
risk, since the writes were still captured by the host's cache.
Fixes: #3721
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 28d59d374b28629a230d36b93e60a8474c902aa5)
commit b8f061dcdb808a6fc5ec01535b37560147b537de
Author: Samuel Just <sam.just@inktank.com>
Date: Thu Jan 3 09:59:45 2013 -0800
OSD: for old osds, dispatch peering messages immediately
Normally, we batch up peering messages until the end of
process_peering_events to allow us to combine many notifies, etc
to the same osd into the same message. However, old osds assume
that the actiavtion message (log or info) will be _dispatched
before the first sub_op_modify of the interval. Thus, for those
peers, we need to send the peering messages before we drop the
pg lock, lest we issue a client repop from another thread before
activation message is sent.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4ae4dce5c5bb547c1ff54d07c8b70d287490cae9)
commit 67968d115daf51762dce65af46b9b843eda592b5
Author: Sage Weil <sage@inktank.com>
Date: Wed Jan 2 22:38:53 2013 -0800
osd: move common active vs booting code into consume_map
Push osdmaps to PGs in separate method from activate_map() (whose name
is becoming less and less accurate).
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a32d6c5dca081dcd8266f4ab51581ed6b2755685)
commit 34266e6bde9f36b1c46144d2341b13605eaa9abe
Author: Sage Weil <sage@inktank.com>
Date: Wed Jan 2 22:20:06 2013 -0800
osd: let pgs process map advances before booting
The OSD deliberate consumes and processes most OSDMaps from while it
was down before it marks itself up, as this is can be slow. The new
threading code does this asynchronously in peering_wq, though, and
does not let it drain before booting the OSD. The OSD can get into
a situation where it marks itself up but is not responsive or useful
because of the backlog, and only makes the situation works by
generating more osdmaps as result.
Fix this by calling activate_map() even when booting, and when booting
draining the peering_wq on each call. This is harmless since we are
not yet processing actual ops; we only need to be async when active.
Fixes: #3714
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0bfad8ef2040a0dd4a0dc1d3abf3ab5b2019d179)
commit 4034f6c817d1efce5fb9eb8cc0a9327f9f7d7910
Author: Sage Weil <sage@inktank.com>
Date: Fri Dec 28 13:07:18 2012 -0800
log: broadcast cond signals
We were using a single cond, and only signalling one waiter. That means
that if the flusher and several logging threads are waiting, and we hit
a limit, we the logger could signal another logger instead of the flusher,
and we could deadlock.
Similarly, if the flusher empties the queue, it might signal only a single
logger, and that logger could re-signal the flusher, and the other logger
could wait forever.
Intead, break the single cond into two: one for loggers, and one for the
flusher. Always signal the (one) flusher, and always broadcast to all
loggers.
Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 813787af3dbb99e42f481af670c4bb0e254e4432)
commit 2141454eee3a1727706d48f8efef92f8a2b98278
Author: Sage Weil <sage@inktank.com>
Date: Wed Jan 2 13:58:44 2013 -0800
log: fix locking typo/stupid for dump_recent()
We weren't locking m_flush_mutex properly, which in turn was leading to
racing threads calling dump_recent() and garbling the crash dump output.
Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 43cba617aa0247d714632bddf31b9271ef3a1b50)
commit 936560137516a1fd5e55b52ccab59c408ac2c245
Author: Sage Weil <sage@inktank.com>
Date: Fri Dec 28 16:48:22 2012 -0800
test_filejournal: optionally specify journal filename as an argument
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 483c6f76adf960017614a8641c4dcdbd7902ce33)
commit be0473bbb1feb8705be4fa8f827704694303a930
Author: Sage Weil <sage@inktank.com>
Date: Fri Dec 28 16:48:05 2012 -0800
test_filejournal: test journaling bl with >IOV_MAX segments
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c461e7fc1e34fdddd8ff8833693d067451df906b)
commit de61932793c5791c770855e470e3b5b9ebb53dba
Author: Sage Weil <sage@inktank.com>
Date: Fri Dec 28 16:47:28 2012 -0800
os/FileJournal: limit size of aio submission
Limit size of each aio submission to IOV_MAX-1 (to be safe). Take care to
only mark the last aio with the seq to signal completion.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit dda7b651895ab392db08e98bf621768fd77540f0)
commit ded454c669171d4038b087cfdad52a57da222c1f
Author: Sage Weil <sage@inktank.com>
Date: Fri Dec 28 15:44:51 2012 -0800
os/FileJournal: logger is optional
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 076b418c7f03c5c62f811fdc566e4e2b776389b7)
|