diff options
Diffstat (limited to 'health/guides/adaptec_raid')
-rw-r--r-- | health/guides/adaptec_raid/adaptec_raid_ld_status.md | 37 | ||||
-rw-r--r-- | health/guides/adaptec_raid/adaptec_raid_pd_state.md | 66 |
2 files changed, 103 insertions, 0 deletions
diff --git a/health/guides/adaptec_raid/adaptec_raid_ld_status.md b/health/guides/adaptec_raid/adaptec_raid_ld_status.md new file mode 100644 index 000000000..7da1cdd17 --- /dev/null +++ b/health/guides/adaptec_raid/adaptec_raid_ld_status.md @@ -0,0 +1,37 @@ +### Understand the alert + +This alert is related to the Adaptec RAID controller, which manages the logical device statuses on your RAID configuration. When this alert is triggered in a critical state, it means that a logical device state value is in a degraded or failed state, indicating that one or more disks in your RAID configuration have failed. + +### Troubleshoot the alert + +Data is priceless. Before taking any action, ensure to have necessary backups in place. Netdata is not liable for any loss or corruption of any data, database, or software. + +Your Adaptec RAID card will automatically start rebuilding a faulty hard drive when you replace it with a healthy one. Sometimes this operation may take some time or may not start automatically. + +#### 1. Verify that a rebuild is not in process + +Check if the rebuild process is already running: + +``` +root@netdata # arcconf GETSTATUS <Controller_num> +``` + +Replace `<Controller_num>` with the number of your RAID controller. + +#### 2. Check for idle/missing segments of logical devices + +Examine the output of the previous command to find any segments that are idle or missing. + +#### 3. Manually change your ld status + +If the rebuild process hasn't started automatically, change the logical device (ld) status manually. This action will trigger a rebuild on your RAID: + +``` +root@netdata # arcconf SETSTATE <Controller_num> LOGICALDRIVE <LD_num> OPTIMAL ADVANCED nocheck noprompt +``` + +Replace `<Controller_num>` with the number of your RAID controller and `<LD_num>` with the number of the logical device. + +### Useful resources + +1. [Microsemi Adaptec ARCCONF User's Guide](https://download.adaptec.com/pdfs/user_guides/microsemi_cli_smarthba_smartraid_v3_00_23484_ug.pdf) diff --git a/health/guides/adaptec_raid/adaptec_raid_pd_state.md b/health/guides/adaptec_raid/adaptec_raid_pd_state.md new file mode 100644 index 000000000..00c9d5901 --- /dev/null +++ b/health/guides/adaptec_raid/adaptec_raid_pd_state.md @@ -0,0 +1,66 @@ +### Understand the Alert + +A RAID controller is a card or chip located between the operating system and a storage drive (usually a hard drive). This is an alert about the Adaptec raid controller. The Netdata Agent checks the physical device statuses which are managed by your raid controller. + +This alert is triggered in critical state when the physical device is offline. + +### Troubleshoot the Alert + +- Check the Offline Disk + +Use the `arcconf` CLI tool to identify which drive or drives are offline: + +``` +root@netdata # arcconf GETCONFIG 1 AL +``` + +This command will display the configuration of all the managed Adaptec RAID controllers in your system. Check the "DEVICE #" and "DEVICE_DEFINITION" fields for details about the offline devices. + +- Examine RAID Array Health + +Check the array health status to better understand the overall array's stability and functionality: + +``` +root@netdata # arcconf GETSTATUS 1 +``` + +This will provide an overview of your RAID controller's health status, including the operational mode, failure state, and rebuild progress (if applicable). + +- Replace the Offline Disk + +Before replacing an offline disk, ensure that you have a current backup of your data. Follow these steps to replace the drive: + +1. Power off your system. +2. Remove the offline drive. +3. Insert the new drive. +4. Power on your system. + +After the drive replacement, the Adaptec RAID card should automatically start rebuilding the faulty disk drive using the new disk. You can check the progress of the rebuild process with the `arcconf` command: + +``` +root@netdata # arcconf GETSTATUS 1 +``` + +- Monitor Rebuild Progress + +It's essential to monitor the RAID array's rebuild process to ensure it completes successfully. Use the `arcconf` command to verify the rebuild status: + +``` +root@netdata # arcconf GETSTATUS 1 +``` + +This command will display the progress and status of the rebuild process. Keep an eye on it until it's completed. + +- Verify RAID Array Health + +After the rebuild is complete, use the `arcconf` command again to verify the health status of the RAID array: + +``` +root@netdata # arcconf GETSTATUS 1 +``` + +Make sure that the RAID array's status is "Optimal" or "Ready" and that the replaced disk drive is now online. + +### Useful Resources + +1. [Adaptec Command Line Interface User’s Guide](https://download.adaptec.com/pdfs/user_guides/microsemi_cli_smarthba_smartraid_v3_00_23484_ug.pdf) |