summaryrefslogtreecommitdiffstats
path: root/health/guides/sync/sync_freq.md
blob: bb1043704dfe69370646b22b8c7e17409b2bf0f3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
### Understand the alert

This alert is triggered when the number of `sync()` system calls is greater than 6. The `sync()` system call writes any data buffered in memory out to disk, including modified superblocks, modified inodes, and delayed reads and writes. A higher number of `sync()` calls indicates that the system is often trying to flush buffered data to disk, which can cause performance issues.

### Troubleshoot the alert

1. Identify the process causing sync events

   Use `bpftrace` to identify which processes are causing the sync events. Make sure you have `bpftrace` installed on your system; if not, follow the instructions here: [Installing bpftrace](https://github.com/iovisor/bpftrace/blob/master/INSTALL.md)

   Run the `syncsnoop.bt` script from the `bpftrace` tools:

   ```
   sudo bpftrace /path/to/syncsnoop.bt
   ```

   This script will trace sync events and display the process ID (PID), process name, and the stack trace.

2. Analyze the output

   Focus on processes with a high number of sync events, and investigate whether you can optimize these processes or reduce their impact on the system. 

   - Check if these processes are essential to system functionality.
   - Look for potential bugs or misconfigurations that may trigger undue `sync()` calls.
   - Consider modifying the process itself to reduce disk I/O or change how it handles write operations.

3. Monitor your system's I/O performance

   Keep an eye on overall I/O performance using tools like `iostat`, `iotop`, or `vmstat`.

   For example, you can use `iostat` to monitor disk I/O:

   ```
   iostat -xz 1
   ```

   This command displays extended disk I/O statistics with a 1-second sampling interval.

   Check for high `await` values, which indicate the average time taken for I/O requests to be completed. Look for high `%util` values, representing the percentage of time the device was busy servicing requests.

### Useful resources

1. [sync man pages](https://man7.org/linux/man-pages/man2/sync.2.html)
2. [bpftrace GitHub repository](https://github.com/iovisor/bpftrace)
3. [syncsnoop example](https://github.com/iovisor/bpftrace/blob/master/tools/syncsnoop_example.txt)
4. [iostat man pages](https://man7.org/linux/man-pages/man1/iostat.1.html)