Notify on Errors in a Log File with Zabbix 1.8

Situation: You want to get notified when a log entry marked ERROR appears in a log file. You want the corresponding trigger to reset back to the OK state if there are no more errors for 10 minutes. (This post assumes certain familiarity with Zabbix UI.)

Solution

Create log item sending error log lines

Create a new item with
  • Type: Zabbix agent (active)
    • It must be "Zabbix agent (active)" because it isn't pulled by the server in regular intervals but pushed by the agent whenever there is a new (matching) log line.
  • Key: log["/tmp/ada/hive.log","ERROR",,20]
    • You could also use logrt if you have rotating logs (zabbix would only read the latest or all, as you want)
    • Here we specify that we only want to send lines that contain the regular expression ERROR, we could also have a more complex pattern such as "(ERRORS|WARNINGS)". We assume that there are many uninteresting log lines and don't want to send those unnecessarily over the network.
  • Type of information: Log
  • Log time format: Optional, a pattern matching date+time info at the beginning of each line, f.ex. "yyyy-MM-dd hh:mm:ss" (you could use also any other character, f.ex. 'x', as a placeholder for timestamp-unrelated data preceeding the timestamp)

Troubleshooting:

If your item isn't receiving any data even though there are error entries in the log then enable detailed logging in Zabbix to verify that the agent and server haven't a problem. (If they do then the status of the item will be changed to "Not supported" by Zabbix.)

To enable logging, make sure there is DebugLevel=5 and LogFile=/var/log/zabbix-agent/zabbix_agentd.log (without a leading #)  in /etc/zabbix/zabbix_agentd.conf.

One possible cause of problem is if Hostname in /etc/zabbix/zabbix_agentd.conf and in the Zabbix UI differ, f.ex. if one of them is fully qualified and the other isn't.

If everything is OK then you should see a log like this:


14187:20120626:085459.706 refresh_active_checks('10.2.0.83',10051)
 ...
 14187:20120626:085459.869 Got [{
        "response":"success",
        "data":[
                {
                        "key":"log[\"\/tmp\/ada\/hive.log\",\"ERROR\",,20]",
                        "delay":"30",
                        "lastlogsize":"127180",
                        "mtime":"0"}]}]
 14187:20120626:085459.869 In parse_list_of_checks()
 14187:20120626:085459.869 In disable_all_metrics()
 14187:20120626:085459.869 In add_check('log["/tmp/ada/hive.log","ERROR",,20]', 30, 127180, 0)
...

14187:20120626:085530.156 In process_active_checks('10.2.0.83',10051) 14187:20120626:085530.156 In process_log() filename:'/tmp/ada/hive.log' lastlogsize:186157

Create a trigger that fires if it receives any data from the item

The item only has data if there are any error logs. Therefore the trigger needs fire if it receives any data and get off if there hasn't been any new (error) data in a period, such as 600 sec. We would therefore create a trigger using the nodata(period) function that returns 1 if there indeed has been no new data in the period:
  • Expression: {myserver.example.com:log["/tmp/ada/hive.log","ERROR",,20].nodata(600)}=0

Create an action to send an e-mail

This is described well enough in Zabbix documentation. Basically you'd create an action with the condition Trigger = <the name of the trigger created above>. (And perhaps with "and Trigger value = PROBLEM".) You might also want to set up escalation to get reminder emails - perhaps after a growing delay - if the problem persists, i.e. if there is an error every 10 min or more often.

Key points

  • Make sure to use Zabbix agent (active)
  • If you aren't getting any data, enable and check the log. Make sure hostname in agent config and server match.
  • Data are only sent when there is an error => use nodata(aPeriod) to automatically reset the trigger (if this is what you want)

Tags: monitoring DevOps


Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob