diff options
Diffstat (limited to 'src/health/guides/sync')
-rw-r--r-- | src/health/guides/sync/sync_freq.md | 46 |
1 files changed, 46 insertions, 0 deletions
diff --git a/src/health/guides/sync/sync_freq.md b/src/health/guides/sync/sync_freq.md new file mode 100644 index 00000000..bb104370 --- /dev/null +++ b/src/health/guides/sync/sync_freq.md @@ -0,0 +1,46 @@ +### Understand the alert + +This alert is triggered when the number of `sync()` system calls is greater than 6. The `sync()` system call writes any data buffered in memory out to disk, including modified superblocks, modified inodes, and delayed reads and writes. A higher number of `sync()` calls indicates that the system is often trying to flush buffered data to disk, which can cause performance issues. + +### Troubleshoot the alert + +1. Identify the process causing sync events + + Use `bpftrace` to identify which processes are causing the sync events. Make sure you have `bpftrace` installed on your system; if not, follow the instructions here: [Installing bpftrace](https://github.com/iovisor/bpftrace/blob/master/INSTALL.md) + + Run the `syncsnoop.bt` script from the `bpftrace` tools: + + ``` + sudo bpftrace /path/to/syncsnoop.bt + ``` + + This script will trace sync events and display the process ID (PID), process name, and the stack trace. + +2. Analyze the output + + Focus on processes with a high number of sync events, and investigate whether you can optimize these processes or reduce their impact on the system. + + - Check if these processes are essential to system functionality. + - Look for potential bugs or misconfigurations that may trigger undue `sync()` calls. + - Consider modifying the process itself to reduce disk I/O or change how it handles write operations. + +3. Monitor your system's I/O performance + + Keep an eye on overall I/O performance using tools like `iostat`, `iotop`, or `vmstat`. + + For example, you can use `iostat` to monitor disk I/O: + + ``` + iostat -xz 1 + ``` + + This command displays extended disk I/O statistics with a 1-second sampling interval. + + Check for high `await` values, which indicate the average time taken for I/O requests to be completed. Look for high `%util` values, representing the percentage of time the device was busy servicing requests. + +### Useful resources + +1. [sync man pages](https://man7.org/linux/man-pages/man2/sync.2.html) +2. [bpftrace GitHub repository](https://github.com/iovisor/bpftrace) +3. [syncsnoop example](https://github.com/iovisor/bpftrace/blob/master/tools/syncsnoop_example.txt) +4. [iostat man pages](https://man7.org/linux/man-pages/man1/iostat.1.html)
\ No newline at end of file |