Need to track disk SMART stats in Zabbix? I found a fairly simple method that does not rely on external scripts (other than the Zabbix agent).
1) Edit your Zabbix Agent config to permit remote commands if you have not already done so. It’s usually /etc/zabbix/zabbix_agentd.conf
2) Near the bottom of your agent config there should be several “UserParamerter=…” lines, add a new one:
UserParameter=hdd.smart[*],sudo smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-
In short, this command spits out a full SmartMonTools report for your drive ($1), greps it for a single specific line ($2), then removes the first 88 characters, leaving only the raw value behind.
Make sure that smartctl is in your suroers file for any user to run without a password prompt. I detail that process in a previous post.
That’s it. Hit up smartctl with the “-A” switch on a drive you want to monitor and note the ID# of the fields you want to pull into Zabbix. Reallocated sectors is usually 5, run time is 9, temperature is 194, etc…
$ sudo smartctl -A /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 166 166 021 Pre-fail Always - 6683
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 221
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 076 076 000 Old_age Always - 17621
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 151
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 28
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 221
194 Temperature_Celsius 0x0022 110 106 000 Old_age Always - 40
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 43
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
To get these numbers into Zabbix you need to go to the configure Items for the host you want to monitor. Go to Configurations, then Hosts, then click on the Items link for the host in question. In the upper right hit the “Create Item” button. Everything on the add item page is fairly self-explanatory. Set the description to something relevant. For key use “hdd.smart[sda,9]”. This grabs the power_on_hours attribute (9) for drive sda. Use any drive and parameter you wish. Set the update interval to something very low to start with (> 30) just to get it pulling data to make sure it works. Go to the Latest Data section under the Monitoring tab. Switch to the host you’re trying to get the SMART stats from using the drop-down on the upper right. Refresh after a few seconds and you should see it pop up under the -other- section at the bottom. Once you’ve verified on the Item is pulling correct data, set the interval higher. For most SMART stats I use 3-5 minutes (180-300s). If you want to get really complicated you can create all these items under a new template and assign an “Application”. Once that’s done all you need to do is assign the template to a host for Zabbix to start grabbing these stats for you automagically.
If you run into a stubborn disk that likes to put random crap after the raw value line in the smartctl output like this:
190 Airflow_Temperature_Cel 0x0022 057 029 045 Old_age Always In_the_past 43 (2 160 46 35)
194 Temperature_Celsius 0x0022 043 071 000 Old_age Always - 43 (0 23 0 0)
Simply adjust the Zabbix agent config to strip the extra bits. Since the temperature should only ever be two digits, adjust your agent’s config like so:
UserParameter=hdd.smart.temp[*],sudo smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-90
This is nearly identical to before, except now it’s cutting everything after the 90th character as well. Make sure to adjust your item’s key to use this modified user parameter.