Using dstat with scripts and external modules

Here we look at the basic scripting options for dstat as well as an overview of its external modules.


All-powerful

Well, not that. But it can feel a little like that when you fiddle with some more advanced aspects of dstat. Because it can do an awful lot.

In the first article of this series we looked at basic usage of dstat - what it does and how to customize its reporting. Now we'll look at making a log of dstat results, along with some details about using external modules.

Script usage

For the most part dstat isn't really designed with scripts in mind - its focus is more on presenting the relevant information for human eyes. If you want to use dstat in a script you can send its output to a text log or to a CSV file (or both).

If you just want a (slightly cluttered) text log of dstat results you can pass it some options to cut down on the output a little bit.

The "--noheaders" option tells dstat not to insert column headers periodically in its results. The headers are still shown when dstat is run.

The "--noupdate" option tells dstat not to try to update its averages every second between reports. It also tells dstat not to use color in its output.

The "--nocolor" option tells dstat not to use colors in its output. Not strictly necessary when logging dstat's output, but certainly useful if you run the command and don't like the colored text. Using "--nocolor" also applies "--noupdate".

The "-t" options adds a column reporting the current date and time.

Quick and dirty text logging

If you want to just run dstat in the background while it records text data over a period of time you can run something like:

dstat -ts --top-io --top-bio --noheaders --nocolor 5 >> /tmp/dstat.txt &

That would take measurements every 5 seconds and write them to the file "/tmp/dstat.txt". The "&" at the end tells the process to run in the background. To end the dstat task you can either kill its process (using the pid it reported when it was backgrounded) or type "fg" to bring dstat to the foreground, and then hit "control-C" to quit it.

Or you can just log out. The process will be terminated when your shell is ended.

CSV output

You can also tell dstat to write the data it collects to a CSV file. Among its many possible uses, a CSV file can be loaded into a spreadsheet program so you can analyze the results as a graph.

The "--output filename.csv" option tells dstat to write CSV data to a file (replace "filename.csv" with whatever file name you'd like it to write to). Note that dstat will still output human-readable information normally when it writes to the CSV file.

When the "--output" option is used the data is appended to the end of the specified file. It won't overwrite what's already there.

If you wanted to use the quick-and-dirty logging command above, but only wanted CSV output, you could run that command as:

dstat -ts --top-io --top-bio --noheaders --nocolor --output /tmp/dstat.csv 5 > /dev/null &

The main output is redirected to /dev/null, which means it won't get written anywhere. Instead you'll get a CSV file at /tmp/dstat.csv.

A sample cron script

You can use cron task scheduling to get dstat to run every minute and log its reports to a file. Open this file for editing (with sudo):

/etc/cron.d/dstat

And put the following into that file if you want a text log:

#  Run dstat and log the reports

* * * * * root dstat -ts --top-io --top-bio --noheaders --nocolor 5 3 >> /var/log/dstat.txt

If you prefer a log of the data in CSV format, put this into the file instead:

#  Run dstat and log the reports

* * * * * root dstat -ts --top-io --top-bio --noheaders --nocolor --output /var/log/dstat.csv 5 3 > /dev/null

Save the file. That cron.d entry will cause dstat to run every minute, logging its reports to "/var/log/dstat.txt" or "/var/log/dstat.csv", depending on the format you chose. It takes three samples when it runs (five seconds apart) and reports on swap usage and the top I/O-using processes.

After a minute take a look in /var/log. If the script ran as expected you'll see a dstat log there.

Tweak the command to your liking. If you want a different number of samples per minute change the second numerical value in the command.

Rotate the created log

If you plan to leave the dstat logger running (and aren't just going to run it for a day or so to check on a burst of disk activity) you'll want to rotate its log occasionally to keep it from getting too big.

To that end, create a file for editing at "/etc/logrotate.d/dstat" (again using sudo).

Put the following into the file:

/var/log/dstat.txt {
  rotate 5
  weekly
  compress
  missingok
  notifempty
}

Replace the filename if necessary - if you used the CSV script above, for example, you'd change it to "/var/log/dstat.csv".

For more details on what that logrotate config will do you can look through our articles on logrotate. You might change the frequency of rotations from "weekly" to "daily" if you change the script to log more reports, then increase the number of archived logs it keeps accordingly.

Module files

One distinction dstat makes regarding its options is between "internal" and "external" modules. The main thing to know about the difference between those types is that "external" modules are defined by files stored in the directory:

/usr/share/dstat

The external modules can sometimes take some extra effort to enable.

Inside a module

To look at a specific example, the file in that directory that describes the module that reports MySQL I/O for version 5 is:

dstat_mysql5_io.py

Looking inside that file will reveal the python code defining the module. This is of most use to users who can read python code, of course.

Python modules

If you try running dstat with the mysql5-io module after a fresh install you'll get an error message:

$ dstat --mysql5-io
Module dstat_mysql5_io failed to load. (No module named MySQLdb)
None of the stats you selected are available.

That means python needs a module installed before dstat can use the mysql5-io module. When you run into that problem you can often find the package you need by running a search for the module name. Pick the command for your distro:

aptitude search mysqldb
yum search mysqldb
emerge --search mysqldb
pacman -Ss mysqldb

Fortunately that's a pretty common module so you should see a package in the results that has "python" in the name, like "python-mysqldb". Install that.

Environment variables

Let's run that dstat command again:

$ dstat --mysql5-io
Module dstat_mysql5_io failed to load. (Cannot interface with MySQL server)
None of the stats you selected are available.

Still more to do! In this case, the problem is that dstat doesn't know a username and password it can use to access MySQL to gather the stats it wants. To get the username and password the module checks for two environment variables:

DSTAT_MYSQL_USER
DSTAT_MYSQL_PWD

Pretty straightforward. The first one is the username, the second is the password. Trouble is, you probably don't want to define the MySQL password in your bashrc or profile config, and defining those variables every time you want to run the check would be a pain.

(You can discover the necessary environment variables from those "os.getenv" calls at the beginning of the module definition file, for the curious.)

Wrapping it up

The solution is to use a shell script "wrapper" that will define those variables before running dstat. At its simplest that would look like:

#!/bin/bash

DSTAT_MYSQL_USER=username
DSTAT_MYSQL_PWD=password

dstat --mysql5-io $@

Put that in a file with a descriptive name (like "mysql5-io.sh"), then set its permissions so no one else can peek inside to see the password:

chmod 700 mysql5-io.sh

And now you can run the script when you want to see the MySQL I/O stats:

$ ./mysql5-io.sh
-mysql5-io-
 recv  sent
1.32B 44.6B
5936k  195M
5936k  195M

The "$@" in the script refers to any arguments passed to the script, which means that any arguments you give the script will be passed as options to dstat. So to add swap use information to the output of the script, run:

$ ./mysql5-io.sh -s
-mysql5-io- ----swap---
 recv  sent| used  free
1.32B 44.6B|2680k 1021M
5946k  195M|2680k 1021M

Learning more

Unfortunately the documentation for the modules isn't very extensive. If you see an interesting-looking module but get an error trying to run it you'll probably need to look inside its definition file to figure out how to get it working. That means knowing some python.

Fortunately python is easier to pick up than many other scripting languages. It's also used by a lot of system tools in many distributions, so a little python knowledge can help in other areas of system administration. A good place to start is the Beginner's Guide to Python.

Summary

You should have a good grasp on dstat's potential now. What's left is experimentation and discovery. And probably the dstat man page. That helps too.

  • -- Jered

Article Comments:

Dag Wieers commented Sat Mar 19 23:05:41 UTC 2011:

You can use Dstat's --output option to write out detailed counter data to a CSV file for later processing.

I would recommend doing that rather than logging the output to a file. It is so much easier to process (or use it to plot graphs).

Jered commented Wed Nov 23 20:43:41 UTC 2011:

I'm smacking my forehead for missing this comment (March was a busy month, honest!). I'll update this article with the recommendation. Thanks!

PG commented Tue Jan 10 22:05:39 UTC 2012:

No matter what I do I can not seem to get the mysql plugins to work properly. Example: --mysql-io displays empty output --mysql5-io displays Module dstatmysql5io failed to load. (Cannot interface with MySQL server) None of the stats you selected are available.

I followed the instructions exactly as shown. No luck. I even temporarily set the profile environment variables with the mysql user and password... still no luck.

Please any help is appreciated. Some relevant info:

Platform = CentOS 5.5 python = 2.4 kernel = 2.6.18-238.19.1.el5 The python mysql module is installed.

PG commented Wed Jan 11 14:29:04 UTC 2012:

I was able to get it working after all for the most part. mysql5 plugins are now working *I'm running mysql 5.1.58 However the innodb plugins are not working and the mysql plugins that are only prefixed with mysql-* are not working. They just produce blank stats. It would be awesome if the innodb plugins worked! Still happy with the mysql5 ones.

PG commented Wed Jan 11 14:34:02 UTC 2012:

P.S. For those who are having trouble. I was trying to get this working on 2 servers. 1 server I had to: yum install MySQL-python 2 server I had to: set the Global Env Variables as shown in this article above. I did it like this:

vi ~/.bash_profile

DSTATMYSQLUSER=root

DSTATMYSQLPWD='*'

export DSTATMYSQLUSER

export DSTATMYSQLPWD

I did this temporarily. I don't recommend this for security reasons. Use the script above and if you have weird characters in your mysql password, put single quotes around it.

Jered commented Sat Jan 14 00:10:10 UTC 2012:

Thanks for posting what you learned, PG. Interesting bit about passwords with special characters. I'll take a look at the package requirements too and update the article to mention possible prereqs like the Python-MySQL package.

roderick commented Wed Jul 11 08:25:03 UTC 2012:

Great article, thnx for publishing! Got a question: do yo know what the first line of the output contains? Is it an average? If so of what timeperiod?

read writ| date/time |run blk new| used free 3938k 3340k|11-07 10:23:08| 0 0 0.8| 36M 2012M 0 0 |11-07 10:23:09| 0 0 0| 36M 2012M 0 0 |11-07 10:23:10| 0 0 0| 36M 2012M etc...

Jered commented Tue Jul 24 20:05:34 UTC 2012:

It should be an average/aggregate of the first chunk of time determined by the "delay" argument. If you didn't specify a delay (like "5" for 5 seconds), the default is to display information for 1-second blocks.

Want to comment?


(not made public)

(optional)

(use plain text or Markdown syntax)