xymon on varnish cache

Introduction

This is the stats collector for xymon to collect Varnish cache. Varnish acts as a cache to backend servers, and can include rules as to what gets cached, how to clear or modify cookies etc.

The client monitoring process

The code below is placed in a "varnish.sh" script in "usr/lib/xymon/client/ext" directory for a standard package installation of the client. Remember to set the code to executable before scheduling it: "chmod 755 /usr/lib/xymon/client/ext/varnish.sh" Note this script is written in a way that simply running it at the command line will display the xymon data to the terminal screen.

Check: /usr/lib/xymon/client/ext/varnish.sh

###########################################################################
#
# Collect varnish stats, but only if varnish is running.
#
###########################################################################
#
# Set up some base variables
#
COLUMN="varnish"
export column
if [ -z "$XYMSRV" ] ; then
  XYMSRV=""
  XYMON="echo"
fi
if [ -z "$MACHINE" ] ; then
  MACHINE=`/bin/uname -n`
fi
COLOR="green"     # overall color
#
# Collect the data
#
if [ -x /usr/bin/varnishstat ] ; then
  chk=`/bin/ps -ef | /bin/grep varnishd | /bin/grep -vc grep`
  if [ $chk -gt 0 ] ; then
    rawdata=`/usr/bin/varnishstat -1 2>/dev/null`
    payload=`echo "${rawdata}" | egrep "client_req |cache_|n_lru_nuked|sess_queued|sess_dropped" | awk ' { printf ("%6s ", $2); for (a=4;a<=NF;a++) printf(" %s", $a); printf ("\n") } '`
    errval=`echo "${rawdata}" | egrep "n_lru_nuked|sess_queued|sess_dropped" |  awk ' { cnt += $2 } END { print cnt }'`
    if [ ${errval} -gt 0 ] ; then
      COLOR="yellow"
      suppmsg="&yellow check for nuked objects and queued or dropped sessions, cache capacity may be too small?"
    fi
    ${XYMON} ${XYMSRV} "status ${MACHINE}.${COLUMN} ${COLOR} `date` - Varnish Cache
${payload}
"
    payload=`echo "${rawdata}" | awk '
    /backend_conn / { backend = sprintf ("%sDS:conn:DERIVE:600:0:U %s\n", backend, $2) }
    /backend_unhealthy / { backend = sprintf ("%sDS:unhealthy:DERIVE:600:0:U %s\n", backend, $2) }
    /backend_busy / { backend = sprintf ("%sDS:busy:DERIVE:600:0:U %s\n", backend, $2) }
    /backend_fail / { backend = sprintf ("%sDS:fail:DERIVE:600:0:U %s\n", backend, $2) }
    /backend_reuse / { backend = sprintf ("%sDS:reuse:DERIVE:600:0:U %s\n", backend, $2) }
    /backend_retry / { backend = sprintf ("%sDS:retry:DERIVE:600:0:U %s\n", backend, $2) }
    /backend_recycle / { backend = sprintf ("%sDS:recycle:DERIVE:600:0:U %s\n", backend, $2) }
    /client_req / { requests = sprintf ("%sDS:req:DERIVE:600:0:U %s\n", requests, $2) }
    /backend_req / { requests = sprintf ("%sDS:backend_req:DERIVE:600:0:U %s\n", requests, $2) }
    /cache_hit / { cache = sprintf ("%sDS:hit:DERIVE:600:0:U %s\n", cache, $2) }
    /cache_hitpass / { cache = sprintf ("%sDS:hitpass:DERIVE:600:0:U %s\n", cache, $2) }
    /cache_miss / { cache = sprintf ("%sDS:miss:DERIVE:600:0:U %s\n", cache, $2) }
    END {
    printf ("[varnish_backend_connections.rrd]\n")
    printf ("%s\n", backend)
    printf ("[varnish_requests.rrd]\n")
    printf ("%s\n", requests)
    printf ("[varnish_cache.rrd]\n")
    printf ("%s\n", cache)
    }'`
    ${XYMON} ${XYMSRV} "data ${MACHINE}.trends
${payload}
"
  fi
fi
exit 0

Scheduling: /var/run/xymon/clientlaunch-include.cfg

Add these lines in the client machine's xymon schedule: /var/run/xymon/clientlaunch-include.cfg

The client process log files should be available in /var/log/xymon/varnish.log

[varnish]
        ENVFILE $XYMONCLIENTHOME/etc/xymonclient.cfg
        CMD $XYMONCLIENTHOME/ext/varnish.sh
        LOGFILE $XYMONCLIENTLOGS/varnish.log
        INTERVAL 5m

Add user xymon to the varnish group

xymon most probably does not have access to the varnish cache data, but this can be rectified by making xymon a member of the varnish group.

sudo usermod -a -G varnish xymon
sudo systemctl restart xymon-client

Server side changes

In order to get the varnish data to be included in a graph, some server side changes need to be made. This includes adjusting the server configuration to include the graph on the "varnish" check and in the "trends" check. The graph definition also needs to be made.

After adding graph data to xymon for the first time allow up to 20 minutes before expecting the first data to get graphed. The system needs to recognise changes to the graph definitions, which it will generall do so automatically. Then it also needs time to get initial start and end points a graphs data.

Update column info

There are 2 variables in /etc/xymon/xymonserver.cfg which are if interest. Both contain comma separated lists of values within quotes. If changing this file be sure your new test is included within quotes and is separated by a comma from other fields. Do not use spaces in these variables.

TEST2RRD This specifies which graph should appear under a test. By default if the test name and the graph name match, then just specify the test name. Otherwise use <Test-Name>=<Graph-Name>, there are some examples in the default file of this type of mapping, eg the "cpu" column includes a "la" graph. For this test a "varnish=varnish_cache" is a recommended starting point.
GRAPHS This defines which graphs to include on the "trends" column webpage, and the order in which they appear. For this test a "varnish_requests,varnish_cache,varnish_backend_connections" entry will add the varnish trends to the trends data.

Graph definition: /etc/xymon/graphs.d/varnish.cfg

[varnish_requests]
    FNPATTERN ^varnish_requests.rrd
    TITLE Varnish requests
    YAXIS avg requests/sec
    DEF:req@RRDIDX@=@RRDFN@:req:AVERAGE
    DEF:backend_req@RRDIDX@=@RRDFN@:backend_req:AVERAGE
    AREA:req@RRDIDX@#@COLOR@:Client requests
    GPRINT:req@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:req@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:req@RRDIDX@:MAX: \: %5.1lf (max)\n
    AREA:backend_req@RRDIDX@#@COLOR@:Backend requests
    GPRINT:backend_req@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:backend_req@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:backend_req@RRDIDX@:MAX: \: %5.1lf (max)\n

[varnish_cache]
    FNPATTERN ^varnish_cache.rrd
    TITLE Varnish cache
    YAXIS avg requests/sec
    DEF:hit@RRDIDX@=@RRDFN@:hit:AVERAGE
    DEF:hitpass@RRDIDX@=@RRDFN@:hitpass:AVERAGE
    DEF:miss@RRDIDX@=@RRDFN@:miss:AVERAGE
    LINE1:hit@RRDIDX@#1E940F:Cache hit
    GPRINT:hit@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:hit@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:hit@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:hitpass@RRDIDX@#718C0E:Cache hitpass
    GPRINT:hitpass@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:hitpass@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:hitpass@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:miss@RRDIDX@#B31B00:Cache miss
    GPRINT:miss@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:miss@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:miss@RRDIDX@:MAX: \: %5.1lf (max) \n

[varnish_backend_connections]
    FNPATTERN ^varnish_backend_connections.rrd
    TITLE Varnish backend connections
    YAXIS avg connections/sec
    DEF:conn@RRDIDX@=@RRDFN@:conn:AVERAGE
    DEF:unhealthy@RRDIDX@=@RRDFN@:unhealthy:AVERAGE
    DEF:busy@RRDIDX@=@RRDFN@:busy:AVERAGE
    DEF:fail@RRDIDX@=@RRDFN@:fail:AVERAGE
    DEF:reuse@RRDIDX@=@RRDFN@:reuse:AVERAGE
    DEF:retry@RRDIDX@=@RRDFN@:retry:AVERAGE
    DEF:recycle@RRDIDX@=@RRDFN@:recycle:AVERAGE
    LINE1:conn@RRDIDX@#605C59:Conn success \t\t
    GPRINT:conn@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:conn@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:conn@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:unhealthy@RRDIDX@#D2AE84:Conn not attempted \t
    GPRINT:unhealthy@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:unhealthy@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:unhealthy@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:busy@RRDIDX@#C9C5C0:Conn too many \t\t
    GPRINT:busy@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:busy@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:busy@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:fail@RRDIDX@#9F3E81:Conn failures \t\t
    GPRINT:fail@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:fail@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:fail@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:reuse@RRDIDX@#C6BE91:Conn reuses \t\t
    GPRINT:reuse@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:reuse@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:reuse@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:retry@RRDIDX@#FD7F00:Conn retry  \t\t
    GPRINT:retry@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:retry@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:retry@RRDIDX@:MAX: \: %5.1lf (max) \n
    LINE1:recycle@RRDIDX@#6E4E40:Conn recycles \t\t
    GPRINT:recycle@RRDIDX@:LAST: \: %5.1lf (cur)
    GPRINT:recycle@RRDIDX@:AVERAGE: \: %5.1lf (avg)
    GPRINT:recycle@RRDIDX@:MAX: \: %5.1lf (max) \n

If the /etc/xymon/graphs.cfg does not already include everything in the /etc/xymon/graphs.d/ directory, add an entry for this file:

include /etc/xymon/graphs.d/varnish.cfg

Thank you for visiting camelthorn.cloud

Home