Graphite Shows Metrics But No Data - Troubleshooting

My Graphite has all the metrics I expect but shows no data for them. Communication between my app and Graphite clearly works otherwise the metrics would not have appeared in the list but why is there no data?

Update: Graphite data gotchas that got me

(These gotchas explain why I did not see any data.)
  1. Graphite shows aggregated, not raw data if the selected query period (24h by default) is greater than the retention period of the highest precision. F.ex. with the schema "1s:30m,1m:1d,5m:2y" you will see data at the 1s precision only if you select period less than or equal to the past 30 minutes. With the default one, you will see the 1-minute aggregates. This applies both to the UI and whisper-fetch.py.
  2. Aggregation drops data unless by default at least 50% of available data slots have values (xFilesFactor=0.5). I.e. if your app sends data at a rate more than twice slower than Graphite expects them, they will never show up in aggregates. F.ex. with the schema "1s:30m,1m:1d,5m:2y"  you must sends data at least 30 times within a minute for them to show in an aggregate.
I suppose that whisper-dump.py would show the raw data.

Lesson learned: Always send data to Graphite in *exactly* same rate as its highest resolution

As described above, if you send data less frequently than twice the highest precision (if 1s => send at least every 2s), aggregation will ignore the data, with the default xFilesFactor=0.5 (a.k.a. min 50% of values reqired factor). On the other hand, if you send data more frequently than the highest precision, only the last data point received in each of the highest precision periods is recorded, others ignored - that's why f.ex. statsD flush period must = Graphite period.






 

Exploring the whisper storage

We can go directly to its whisper storage and see is there f.ex. for the metric ring.request.active.count:
ll -h /opt/graphite/storage/whisper/ring/requests/rate/count.wsp
-rw-r--r-- 1 root root 2.5M May  5 11:01 /opt/graphite/storage/whisper/ring/requests/rate/count.wsp
OK, so the file is there and is not empty. What does it contain?
$ whisper-fetch.py ./count.wsp | tail -n1
1399287780    None
The data is hard to read, let's pretty print it:
$ whisper-fetch.py --pretty ./count.wsp | tail -n1
Mon May  5 11:02:00 2014    None
OK, so the data is as expected but the value is None. Let's verify that:
$ whisper-fetch.py --pretty --json ./count.wsp
{
"start" : 1399193520,
"end" : 1399279920,
"step" : 60,
"values" : [null, null, ...]
}
We can get the same data also via the API: http://graphite-webapp.example.com/render?from=-24h&target=ring.request.active.count&title=Testing&rawData=true

So the values are indeed null. According to the docs: "If no value is recorded at a particular timestamp bucket in a series, the value will be None (null).".

We were able to find out what data and for what period Graphite has by reading directly its whisper files. The next step will be to find out why the values are null.

Update: What did I do wrong? I have not noticed that whisper-fetch shows me data aggregated to the next, 1 min precision, because I have asked for a period longer than the 30 min rentention period of my 1s precision (the default period of 24h). I also did not know that aggregation results in None if there are too few data points.

Logs

I run Graphite via Nginx so /var/log/nginx/error.log and /var/log/gunicorn/graphite.log might help.

Graphite's own logs are f.ex. /opt/graphite/storage/log/carbon-cache/carbon-cache-a/listener.log (and console.log, creates.log  etc.) and /opt/graphite/storage/log/webapp/*.log.

Experimenting

Lets manually push some data to Graphite, namely its data collecting daemon Carbon, to see what happens.

echo "jakub.test 42 $(date +%s)" | nc 0.0.0.0 2003 # Carbon listens at 2003

A minute or so later:
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | head -n1
Sun May  4 12:19:00 2014	None
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | tail -n1
Mon May  5 12:09:00 2014	None
$ whisper-fetch.py --pretty /opt/graphite/storage/whisper/jakub/test.wsp | grep -v None | wc -l
0
What is happening? Carbon has created the metric and filled with "no data" for the past ~ 24 hours (it is, I believe, the default behavior to automatically add data for each period in the past 24h when creating a metric) but the value I sent to it is not there.

ngrep: What data is being sent?

Let's us see what data is actually being sent to carbon by intercepting the traffic on port 2003 with ngrep (thanks to Tim Zetha for the tip):
graphite-server$ sudo ngrep -d any port 2003 & # Intercept packets on any (eth, lo, ..) device, port 2003
interface: any
filter: (ip or ip6) and ( port 2003 )
graphite-server$ echo -e "jakub.test2  45 $(date +%s)" | sudo nc localhost 2003
####
T 127.0.0.1:34696 -> 127.0.0.1:2003 [AP]
  jakub.test2  45 1399362193.
####^Cexit
23 received, 0 dropped
"T" means this is TCP (and not f.ex. U[DP]), the letters in [] are TCP flags as printed by ngrep (Ack, Fin = no more data, Sync = during conn setup, ...). A "#" is reportedly displayed for each non-matching packet (use -e[mpty] to print more of them). Notice that ngrep captures both in- and out-going packets.

------

PS: I have asked for tips at StackExchange.

Tags: monitoring


Copyright © 2024 Jakub Holý
Powered by Cryogen
Theme by KingMob