summaryrefslogtreecommitdiffstats
path: root/doc/rados/troubleshooting/log-and-debug.rst
blob: 929c3f53f8841b4dd29e68543237cf56ce151bcb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
=======================
 Logging and Debugging
=======================

Ceph component debug log levels can be adjusted at runtime, while services are
running. In some circumstances you might want to adjust debug log levels in
``ceph.conf`` or in the central config store. Increased debug logging can be
useful if you are encountering issues when operating your cluster.  By default,
Ceph log files are in ``/var/log/ceph``.

.. tip:: Remember that debug output can slow down your system, and that this
   latency sometimes hides race conditions.

Debug logging is resource intensive. If you encounter a problem in a specific
component of your cluster, begin troubleshooting by enabling logging for only
that component of the cluster. For example, if your OSDs are running without
errors, but your metadata servers are not, enable logging for any specific
metadata server instances that are having problems. Continue by enabling
logging for each subsystem only as needed.

.. important:: Verbose logging sometimes generates over 1 GB of data per hour.
   If the disk that your operating system runs on (your "OS disk") reaches its
   capacity, the node associated with that disk will stop working.

Whenever you enable or increase the rate of debug logging, make sure that you
have ample capacity for log files, as this may dramatically increase their
size.  For details on rotating log files, see `Accelerating Log Rotation`_.
When your system is running well again, remove unnecessary debugging settings
in order to ensure that your cluster runs optimally. Logging debug-output
messages is a slow process and a potential waste of your cluster's resources.

For details on available settings, see `Subsystem, Log and Debug Settings`_.

Runtime
=======

To see the configuration settings at runtime, log in to a host that has a
running daemon and run a command of the following form:

.. prompt:: bash $

   ceph daemon {daemon-name} config show | less

For example:

.. prompt:: bash $

   ceph daemon osd.0 config show | less

To activate Ceph's debugging output (that is, the ``dout()`` logging function)
at runtime, inject arguments into the runtime configuration by running a ``ceph
tell`` command of the following form:

..  prompt:: bash $

    ceph tell {daemon-type}.{daemon id or *} config set {name} {value}

Here ``{daemon-type}`` is ``osd``, ``mon``, or ``mds``. Apply the runtime
setting either to a specific daemon (by specifying its ID) or to all daemons of
a particular type (by using the ``*`` operator).  For example, to increase
debug logging for a specific ``ceph-osd`` daemon named ``osd.0``, run the
following command:

..  prompt:: bash $

    ceph tell osd.0 config set debug_osd 0/5

The ``ceph tell`` command goes through the monitors. However, if you are unable
to bind to the monitor, there is another method that can be used to activate
Ceph's debugging output: use the ``ceph daemon`` command to log in to the host
of a specific daemon and change the daemon's configuration. For example:

.. prompt:: bash $

   sudo ceph daemon osd.0 config set debug_osd 0/5

For details on available settings, see `Subsystem, Log and Debug Settings`_.


Boot Time
=========

To activate Ceph's debugging output (that is, the ``dout()`` logging function)
at boot time, you must add settings to your Ceph configuration file.
Subsystems that are common to all daemons are set under ``[global]`` in the
configuration file. Subsystems for a specific daemon are set under the relevant
daemon section in the configuration file (for example, ``[mon]``, ``[osd]``,
``[mds]``). Here is an example that shows possible debugging settings in a Ceph
configuration file:

.. code-block:: ini

    [global]
        debug_ms = 1/5
        
    [mon]
        debug_mon = 20
        debug_paxos = 1/5
        debug_auth = 2
         
     [osd]
         debug_osd = 1/5
         debug_filestore = 1/5
         debug_journal = 1
         debug_monc = 5/20
         
    [mds]
        debug_mds = 1
        debug_mds_balancer = 1


For details, see `Subsystem, Log and Debug Settings`_.


Accelerating Log Rotation
=========================

If your log filesystem is nearly full, you can accelerate log rotation by
modifying the Ceph log rotation file at ``/etc/logrotate.d/ceph``. To increase
the frequency of log rotation (which will guard against a filesystem reaching
capacity), add a ``size`` directive after the ``weekly`` frequency directive.
To smooth out volume spikes, consider changing ``weekly`` to ``daily`` and
consider changing ``rotate`` to ``30``. The procedure for adding the size
setting is shown immediately below. 

#. Note the default settings of the ``/etc/logrotate.d/ceph`` file::

      rotate 7
      weekly
      compress
      sharedscripts

#. Modify them by adding a ``size`` setting::

      rotate 7
      weekly
      size 500M
      compress
      sharedscripts

#. Start the crontab editor for your user space:

   .. prompt:: bash $

      crontab -e

#. Add an entry to crontab that instructs cron to check the
   ``etc/logrotate.d/ceph`` file::

      30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1

In this example, the ``etc/logrotate.d/ceph`` file will be checked every 30
minutes.

Valgrind
========

When you are debugging your cluster's performance, you might find it necessary
to track down memory and threading issues. The Valgrind tool suite can be used
to detect problems in a specific daemon, in a particular type of daemon, or in
the entire cluster. Because Valgrind is computationally expensive, it should be
used only when developing or debugging Ceph, and it will slow down your system
if used at other times. Valgrind messages are logged to ``stderr``. 


Subsystem, Log and Debug Settings
=================================

Debug logging output is typically enabled via subsystems. 

Ceph Subsystems
---------------

For each subsystem, there is a logging level for its output logs (a so-called
"log level") and a logging level for its in-memory logs (a so-called "memory
level"). Different values may be set for these two logging levels in each
subsystem. Ceph's logging levels operate on a scale of ``1`` to ``20``, where
``1`` is terse and ``20`` is verbose [#f1]_.  As a general rule, the in-memory
logs are not sent to the output log unless one or more of the following
conditions obtain:

- a fatal signal is raised or
- an ``assert`` in source code is triggered or
- upon requested. Please consult `document on admin socket
  <http://docs.ceph.com/en/latest/man/8/ceph/#daemon>`_ for more details.

.. warning ::
   .. [#f1] In certain rare cases, there are logging levels that can take a value greater than 20. The resulting logs are extremely verbose.

Log levels and memory levels can be set either together or separately. If a
subsystem is assigned a single value, then that value determines both the log
level and the memory level. For example, ``debug ms = 5`` will give the ``ms``
subsystem a log level of ``5`` and a memory level of ``5``.  On the other hand,
if a subsystem is assigned two values that are separated by a forward slash
(/), then the first value determines the log level and the second value
determines the memory level. For example, ``debug ms = 1/5`` will give the
``ms`` subsystem a log level of ``1`` and a memory level of ``5``. See the
following:

.. code-block:: ini 

    debug {subsystem} = {log-level}/{memory-level}
    #for example
    debug mds balancer = 1/20

The following table provides a list of Ceph subsystems and their default log and
memory levels. Once you complete your logging efforts, restore the subsystems
to their default level or to a level suitable for normal operations.

+--------------------------+-----------+--------------+
| Subsystem                | Log Level | Memory Level |
+==========================+===========+==============+
| ``default``              |     0     |      5       |
+--------------------------+-----------+--------------+
| ``lockdep``              |     0     |      1       |
+--------------------------+-----------+--------------+
| ``context``              |     0     |      1       |
+--------------------------+-----------+--------------+
| ``crush``                |     1     |      1       |
+--------------------------+-----------+--------------+
| ``mds``                  |     1     |      5       |
+--------------------------+-----------+--------------+
| ``mds balancer``         |     1     |      5       |
+--------------------------+-----------+--------------+
| ``mds log``              |     1     |      5       |
+--------------------------+-----------+--------------+
| ``mds log expire``       |     1     |      5       |
+--------------------------+-----------+--------------+
| ``mds migrator``         |     1     |      5       |
+--------------------------+-----------+--------------+
| ``buffer``               |     0     |      1       |
+--------------------------+-----------+--------------+
| ``timer``                |     0     |      1       |
+--------------------------+-----------+--------------+
| ``filer``                |     0     |      1       |
+--------------------------+-----------+--------------+
| ``striper``              |     0     |      1       |
+--------------------------+-----------+--------------+
| ``objecter``             |     0     |      1       |
+--------------------------+-----------+--------------+
| ``rados``                |     0     |      5       |
+--------------------------+-----------+--------------+
| ``rbd``                  |     0     |      5       |
+--------------------------+-----------+--------------+
| ``rbd mirror``           |     0     |      5       |
+--------------------------+-----------+--------------+
| ``rbd replay``           |     0     |      5       |
+--------------------------+-----------+--------------+
| ``rbd pwl``              |     0     |      5       |
+--------------------------+-----------+--------------+
| ``journaler``            |     0     |      5       |
+--------------------------+-----------+--------------+
| ``objectcacher``         |     0     |      5       |
+--------------------------+-----------+--------------+
| ``immutable obj cache``  |     0     |      5       |
+--------------------------+-----------+--------------+
| ``client``               |     0     |      5       |
+--------------------------+-----------+--------------+
| ``osd``                  |     1     |      5       |
+--------------------------+-----------+--------------+
| ``optracker``            |     0     |      5       |
+--------------------------+-----------+--------------+
| ``objclass``             |     0     |      5       |
+--------------------------+-----------+--------------+
| ``filestore``            |     1     |      3       |
+--------------------------+-----------+--------------+
| ``journal``              |     1     |      3       |
+--------------------------+-----------+--------------+
| ``ms``                   |     0     |      5       |
+--------------------------+-----------+--------------+
| ``mon``                  |     1     |      5       |
+--------------------------+-----------+--------------+
| ``monc``                 |     0     |      10      |
+--------------------------+-----------+--------------+
| ``paxos``                |     1     |      5       |
+--------------------------+-----------+--------------+
| ``tp``                   |     0     |      5       |
+--------------------------+-----------+--------------+
| ``auth``                 |     1     |      5       |
+--------------------------+-----------+--------------+
| ``crypto``               |     1     |      5       |
+--------------------------+-----------+--------------+
| ``finisher``             |     1     |      1       |
+--------------------------+-----------+--------------+
| ``reserver``             |     1     |      1       |
+--------------------------+-----------+--------------+
| ``heartbeatmap``         |     1     |      5       |
+--------------------------+-----------+--------------+
| ``perfcounter``          |     1     |      5       |
+--------------------------+-----------+--------------+
| ``rgw``                  |     1     |      5       |
+--------------------------+-----------+--------------+
| ``rgw sync``             |     1     |      5       |
+--------------------------+-----------+--------------+
| ``rgw datacache``        |     1     |      5       |
+--------------------------+-----------+--------------+
| ``rgw access``           |     1     |      5       |
+--------------------------+-----------+--------------+
| ``rgw dbstore``          |     1     |      5       |
+--------------------------+-----------+--------------+
| ``javaclient``           |     1     |      5       |
+--------------------------+-----------+--------------+
| ``asok``                 |     1     |      5       |
+--------------------------+-----------+--------------+
| ``throttle``             |     1     |      1       |
+--------------------------+-----------+--------------+
| ``refs``                 |     0     |      0       |
+--------------------------+-----------+--------------+
| ``compressor``           |     1     |      5       |
+--------------------------+-----------+--------------+
| ``bluestore``            |     1     |      5       |
+--------------------------+-----------+--------------+
| ``bluefs``               |     1     |      5       |
+--------------------------+-----------+--------------+
| ``bdev``                 |     1     |      3       |
+--------------------------+-----------+--------------+
| ``kstore``               |     1     |      5       |
+--------------------------+-----------+--------------+
| ``rocksdb``              |     4     |      5       |
+--------------------------+-----------+--------------+
| ``leveldb``              |     4     |      5       |
+--------------------------+-----------+--------------+
| ``fuse``                 |     1     |      5       |
+--------------------------+-----------+--------------+
| ``mgr``                  |     2     |      5       |
+--------------------------+-----------+--------------+
| ``mgrc``                 |     1     |      5       |
+--------------------------+-----------+--------------+
| ``dpdk``                 |     1     |      5       |
+--------------------------+-----------+--------------+
| ``eventtrace``           |     1     |      5       |
+--------------------------+-----------+--------------+
| ``prioritycache``        |     1     |      5       |
+--------------------------+-----------+--------------+
| ``test``                 |     0     |      5       |
+--------------------------+-----------+--------------+
| ``cephfs mirror``        |     0     |      5       |
+--------------------------+-----------+--------------+
| ``cepgsqlite``           |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore``             |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore onode``       |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore odata``       |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore ompap``       |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore tm``          |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore t``           |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore cleaner``     |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore epm``         |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore lba``         |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore fixedkv tree``|     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore cache``       |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore journal``     |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore device``      |     0     |      5       |
+--------------------------+-----------+--------------+
| ``seastore backref``     |     0     |      5       |
+--------------------------+-----------+--------------+
| ``alienstore``           |     0     |      5       |
+--------------------------+-----------+--------------+
| ``mclock``               |     1     |      5       |
+--------------------------+-----------+--------------+
| ``cyanstore``            |     0     |      5       |
+--------------------------+-----------+--------------+
| ``ceph exporter``        |     1     |      5       |
+--------------------------+-----------+--------------+
| ``memstore``             |     1     |      5       |
+--------------------------+-----------+--------------+


Logging Settings
----------------

It is not necessary to specify logging and debugging settings in the Ceph
configuration file, but you may override default settings when needed. Ceph
supports the following settings:

.. confval:: log_file
.. confval:: log_max_new
.. confval:: log_max_recent
.. confval:: log_to_file
.. confval:: log_to_stderr
.. confval:: err_to_stderr
.. confval:: log_to_syslog
.. confval:: err_to_syslog
.. confval:: log_flush_on_exit
.. confval:: clog_to_monitors
.. confval:: clog_to_syslog
.. confval:: mon_cluster_log_to_syslog
.. confval:: mon_cluster_log_file

OSD
---

.. confval:: osd_debug_drop_ping_probability
.. confval:: osd_debug_drop_ping_duration

Filestore
---------

.. confval:: filestore_debug_omap_check

MDS
---

- :confval:`mds_debug_scatterstat`
- :confval:`mds_debug_frag`
- :confval:`mds_debug_auth_pins`
- :confval:`mds_debug_subtrees`

RADOS Gateway
-------------

- :confval:`rgw_log_nonexistent_bucket`
- :confval:`rgw_log_object_name`
- :confval:`rgw_log_object_name_utc`
- :confval:`rgw_enable_ops_log`
- :confval:`rgw_enable_usage_log`
- :confval:`rgw_usage_log_flush_threshold`
- :confval:`rgw_usage_log_tick_interval`