privacy

Monitor S.M.A.R.T. stats in Zabbix

Need to track disk SMART stats in Zabbix? I found a fairly simple method that does not rely on external scripts (other than the Zabbix agent).

1) Edit your Zabbix Agent config to permit remote commands if you have not already done so. It’s usually /etc/zabbix/zabbix_agentd.conf

EnableRemoteCommands=1

2) Near the bottom of your agent config there should be several “UserParamerter=…” lines, add a new one:

UserParameter=hdd.smart[*],sudo smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-

In short, this command spits out a full SmartMonTools report for your drive ($1), greps it for a single specific line ($2), then removes the first 88 characters, leaving only the raw value behind.

Make sure that smartctl is in your suroers file for any user to run without a password prompt. I detail that process in a previous post.

That’s it. Hit up smartctl with the “-A” switch on a drive you want to monitor and note the ID# of the fields you want to pull into Zabbix. Reallocated sectors is usually 5, run time is 9, temperature is 194, etc…

$ sudo smartctl -A /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   166   166   021    Pre-fail  Always       -       6683
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       221
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       17621
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       151
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       221
194 Temperature_Celsius     0x0022   110   106   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       43
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

To get these numbers into Zabbix you need to go to the configure Items for the host you want to monitor. Go to Configurations, then Hosts, then click on the Items link for the host in question. In the upper right hit the “Create Item” button. Everything on the add item page is fairly self-explanatory. Set the description to something relevant. For key use “hdd.smart[sda,9]“. This grabs the power_on_hours attribute (9) for drive sda. Use any drive and parameter you wish. Set the update interval to something very low to start with (> 30) just to get it pulling data to make sure it works. Go to the Latest Data section under the Monitoring tab. Switch to the host you’re trying to get the SMART stats from using the drop-down on the upper right. Refresh after a few seconds and you should see it pop up under the -other- section at the bottom. Once you’ve verified on the Item is pulling correct data, set the interval higher. For most SMART stats I use 3-5 minutes (180-300s). If you want to get really complicated you can create all these items under a new template and assign an “Application”. Once that’s done all you need to do is assign the template to a host for Zabbix to start grabbing these stats for you automagically.

 

If you run into a stubborn disk that likes to put random crap after the raw value line in the smartctl output like this:

190 Airflow_Temperature_Cel 0x0022   057   029   045    Old_age   Always   In_the_past 43 (2 160 46 35)
194 Temperature_Celsius     0x0022   043   071   000    Old_age   Always       -       43 (0 23 0 0)

Simply adjust the Zabbix agent config to strip the extra bits. Since the temperature should only ever be two digits, adjust your agent’s config like so:

UserParameter=hdd.smart.temp[*],sudo smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-90

This is nearly identical to before, except now it’s cutting everything after the 90th character as well. Make sure to adjust your item’s key to use this modified user parameter.

careers

Allow any user on linux to run smartctl without password

Need to have a script or external application run smartctl without being prompted for a password? Simply add it to the sudoers file. Under Ubuntu/Debian use “visudo” to edit it (DO NOT EDIT IT WITHOUT USING THIS COMMAND!) and add the following line:

ALL ALL=(ALL)NOPASSWD: /usr/sbin/smartctl

This allows any user, from any source (local or remote) to run the smartctl command without being prompted for a password. Note, your script or user will still need to preface the smartctl command with sudo.

home
copyright