summaryrefslogtreecommitdiffstats
path: root/doc/dev/blkin.rst
blob: 989cddcd7ee61f08fd057097ac7e99ef7cae0faf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
=========================
 Tracing Ceph With LTTng
=========================

Configuring Ceph with LTTng
===========================

Use -DWITH_LTTNG option (default: ON)::

  ./do_cmake -DWITH_LTTNG=ON

Config option for tracing must be set to true in ceph.conf.
Following options are currently available::

  bluestore_tracing
  event_tracing (-DWITH_EVENTTRACE)
  osd_function_tracing (-DWITH_OSD_INSTRUMENT_FUNCTIONS)
  osd_objectstore_tracing (actually filestore tracing)
  rbd_tracing
  osd_tracing
  rados_tracing
  rgw_op_tracing
  rgw_rados_tracing

Testing Trace
=============

Start LTTng daemon::

  lttng-sessiond --daemonize

Run vstart cluster with enabling trace options::

  ../src/vstart.sh -d -n -l -e -o "osd_tracing = true"

List available tracepoints::

  lttng list --userspace

You will get something like::

  UST events:
  -------------
  PID: 100859 - Name: /path/to/ceph-osd
      pg:queue_op (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      osd:do_osd_op_post (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      osd:do_osd_op_pre_unknown (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      osd:do_osd_op_pre_copy_from (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      osd:do_osd_op_pre_copy_get (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
      ...

Create tracing session, enable tracepoints and start trace::

  lttng create trace-test
  lttng enable-event --userspace osd:*
  lttng start

Perform some Ceph operation::

  rados bench -p ec 5 write

Stop tracing and view result::

  lttng stop
  lttng view

Destroy tracing session::

  lttng destroy

=========================
 Tracing Ceph With Blkin
=========================

Ceph can use Blkin, a library created by Marios Kogias and others,
which enables tracking a specific request from the time it enters
the system at higher levels till it is finally served by RADOS.

In general, Blkin implements the Dapper_ tracing semantics
in order to show the causal relationships between the different
processing phases that an IO request may trigger. The goal is an
end-to-end visualisation of the request's route in the system,
accompanied by information concerning latencies in each processing
phase. Thanks to LTTng this can happen with a minimal overhead and
in realtime. The LTTng traces can then be visualized with Twitter's
Zipkin_.

.. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
.. _Zipkin: https://zipkin.io/


Configuring Ceph with Blkin
===========================

Use -DWITH_BLKIN option (which requires -DWITH_LTTNG)::

  ./do_cmake -DWITH_LTTNG=ON -DWITH_BLKIN=ON

Config option for blkin must be set to true in ceph.conf.
Following options are currently available::

  rbd_blkin_trace_all
  osd_blkin_trace_all
  osdc_blkin_trace_all

Testing Blkin
=============

It's easy to test Ceph's Blkin tracing. Let's assume you don't have
Ceph already running, and you compiled Ceph with Blkin support but
you didn't install it. Then launch Ceph with the ``vstart.sh`` script
in Ceph's src directory so you can see the possible tracepoints.::

  OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"
  lttng list --userspace

You'll see something like the following:::

  UST events:
  -------------
  PID: 8987 - Name: ./ceph-osd
        zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
        zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
        zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
        lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)

  PID: 8407 - Name: ./ceph-mon
        zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
        zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
        zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
        lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)

  ...

Next, stop Ceph so that the tracepoints can be enabled.::

  ../src/stop.sh

Start up an LTTng session and enable the tracepoints.::

  lttng create blkin-test
  lttng enable-event --userspace zipkin:timestamp
  lttng enable-event --userspace zipkin:keyval_integer
  lttng enable-event --userspace zipkin:keyval_string
  lttng start

Then start up Ceph again.::

  OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"

You may want to check that ceph is up.::

  ceph status

Now put something in using rados, check that it made it, get it back, and remove it.::

  ceph osd pool create test-blkin
  rados put test-object-1 ../src/vstart.sh --pool=test-blkin
  rados -p test-blkin ls
  ceph osd map test-blkin test-object-1
  rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
  md5sum vstart*
  rados rm test-object-1 --pool=test-blkin

You could also use the example in ``examples/librados/`` or ``rados bench``.

Then stop the LTTng session and see what was collected.::

  lttng stop
  lttng view

You'll see something like:::

  [15:33:08.884275486] (+0.000225472) ubuntu zipkin:timestamp: { cpu_id = 53 }, { trace_name = "op", service_name = "Objecter", port_no = 0, ip = "0.0.0.0", trace_id = 5485970765435202833, span_id = 5485970765435202833, parent_span_id = 0, event = "osd op reply" }
  [15:33:08.884614135] (+0.000002839) ubuntu zipkin:keyval_integer: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "tid", val = 2 }
  [15:33:08.884616431] (+0.000002296) ubuntu zipkin:keyval_string: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "entity type", val = "client" }


Install  Zipkin
===============
One of the points of using Blkin is so that you can look at the traces
using Zipkin. Users should run Zipkin as a tracepoints collector and
also a web service. The executable jar runs a collector on port 9410 and
the web interface on port 9411

Download Zipkin Package::

  git clone https://github.com/openzipkin/zipkin && cd zipkin
  wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec'
  java -jar zipkin.jar

Or, launch docker image::

  docker run -d -p 9411:9411 openzipkin/Zipkin

Show Ceph's Blkin Traces in Zipkin-web
======================================
Download babeltrace-zipkin project. This project takes the traces
generated with blkin and sends them to a Zipkin collector using scribe::

  git clone https://github.com/vears91/babeltrace-zipkin
  cd babeltrace-zipkin

Send lttng data to Zipkin::

  python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}

Example::

  python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1

Check Ceph traces on webpage::

  Browse http://${zipkin-collector-ip}:9411
  Click "Find traces"