The awstats program is a versatile tool for generating web traffic reports. We'll walk through a simple installation to track stats for your site.
Web log analysis
If you run a web site you might get curious about statistics like how many people visit your site each month and what sites or search engines they used to find you. That's where web traffic analysis comes in.
There are many options for analyzing your traffic, but in this article series we'll look at a program called "awstats". Awstats runs through your web logs (which are lying around on the disk anyway) and generates reports based on what it finds. The reports break down the data to show you information like what the more popular parts of your site are, what search terms people used to get there, and which search engines have spidered your site lately.
We'll aim for a simple approach to analyzing logs with awstats in this series. There are some nifty features of awstats we won't be using (like dynamically generating reports via CGI), but the benefit of this approach is that it's light on resource use and easy to set up no matter what web server you use.
Why not just use page tagging?
The approach awstats uses is called "log analysis". The analyzer program goes through your web logs line-by-line and sorts out what files got served and where the requests came from. Because this approach doesn't rely on the visiting web browser executing any specific code properly, the numbers for total traffic a log analyzer gathers with will be closer to an accurate tally. The downside is that the only identifying information recorded about a visitor is the IP address they used, which isn't always a reliable way to distinguish between users (since several of them could be behind the same proxy server or firewall).
In the end, neither approach is really superior to the other.
Page tagging gives you better information about how often visitors return to your site and what they do there, but doesn't record visitors that can't or won't execute the tag (like older browsers, many mobile phones, users with privacy concerns, and search engine robots).
Log analysis gives you better information about how much traffic your web server handles but is less reliable when it comes to determining site usage patterns.
See where I'm going? Both approaches have complementary strengths and weaknesses. Somewhere between the page tagging statistics and the web log analysis lies the whole picture.
For the most accurate assessment you'll want to have both types of usage reports available and extrapolate from there.
Before installing awstats pick out the virtual host you want to report on. If you want to use more than one you can go through this guide again for each one, but make sure each virtual host is logging to its own access log. It's possible to use a single log for multiple sites, but it's more complicated and isn't recommended. We're going for simple, remember?
First make sure you have a web server. Hey, might as well be thorough.
With that out of the way, we want to see if the virtual host we're going to be tracking is logging in the right format. While awstats can handle some other log formats, what we want to use is the standard "combined" web log format.
Most web servers, like nginx or lighttpd, use the combined log format by default. No problems there unless you went out of your way to change it.
If you're using apache it might be logging in either a "combined" or a "common" log format. To find out which, take a look in your virtual host config file and look for the "CustomLog" directive:
CustomLog /var/www/access.log combined
That last word is the one to check for the format. If it isn't there, or if it says something like "common", change the config so it's using "combined" for the format instead. For more information on the combined log format, check out this article for apache or this article for nginx.
If you altered the format used for the virtual host's log remember to reload the web server to implement the change.
The awstats scripts use a scripting language called "perl". It's used for a lot of things, so you probably have it installed already.
To check, run the command:
If you get a response that gives you a perl version, you're set. If you get a "command not found" error then you need to install perl. You should be able to do that through your distribution's package manager, like yum or aptitude.
Download and extract
We're actually not going to use your Linux distribution's pre-packaged version of awstats (even if it has one). The awstats program gets updated regularly, and new versions include data on the latest web browser and operating system identifiers. It's best if you install the source package for awstats, then manually update it regularly so you get the most accurate reports possible.
You can get the latest version of awstats from the project's download page. You can decide if you want the latest beta for cutting-edge reporting data, or if you want to get the latest stable version instead and play it safe. In this guide we used the latest stable version (6.95 at the time of this writing).
Get the download with the ".tar.gz" extension. It's more unixy, and saves you from checking script permissions after the install.
You can either download the package from the awstats site to your desktop and then upload it to your VPS with scp, or you can download it directly to the VPS if you have wget installed:
Once you have the package on your VPS, unpack it:
tar -xvzf awstats-6.95.tar.gz
You should end up with a directory named after the awstats version, like "awstats-6.95". Now you just need to move that to wherever you actually want awstats installed (I used /usr/local/awstats):
sudo mv awstats-6.95 /usr/local/awstats
If you want to update to another version of awstats later, just go through those steps with the new version. Replace the old "awstats" directory with the new one. Simple.
Choose the reports directory
Next you'll need to create an output directory for your reports. This can be pretty much anywhere you like, since the reports are just static html pages. They just need to be accessible from your web browser if you want to view them.
For this example, let's say we're going to be tracking traffic for "www.example.com", and we put that site's files in a directory in the "demo" user's home directory. We're feeling unoriginal, so we'll make a "webstats" directory for awstats' reports:
mkdir -p /home/demo/public_html/example.com/public/webstats
The awstats icons
The html pages that awstats creates when it makes its reports want to use some images to make them a bit less bland. Let's make sure a copy of those images will be available to the reports we generate:
cp -R /usr/local/awstats/wwwroot/icon /home/demo/public_html/example.com/public/webstats/awstatsicons
When you update awstats you may want to re-copy this directory as well, to catch any additions (like icons for new browsers).
Create the data directory
We'll need to give awstats a directory where it can store the data it uses to generate its reports. The default is "/var/lib/awstats". That's a pretty good location.
sudo mkdir -p /var/lib/awstats
Copy the config template
Now to set up a config file telling awstats how to process the logs for your domain. First, create a directory to hold the config:
sudo mkdir -p /etc/awstats
Next we'll copy a template config file into that directory that we can modify for our domain.
The template config file is located in the awstats installation's wwwroot/cgi-bin directory:
Name the new config file in the style of "awstats.[main domain name].conf". If you were creating a config for "www.example.com" and had installed awstats to "/usr/local/awstats", your copy command would look like:
sudo cp /usr/local/awstats/wwwroot/cgi-bin/awstats.model.conf /etc/awstats/awstats.www.example.com.conf
Customize the configuration
Time to dig around in the config file we created. Using your favorite text editor (nano or vi, usually), edit:
Change "www.example.com" above to the name of the domain you'll be tracking. If the file doesn't exist you may have made a typo when you copied it. The file should be chock full of stuff right now.
Fortunately, part of keeping things simple is only needing to change a few config settings, and those are toward the beginning of the file. Let's look at the settings you'll need to pay particular attention to and their default values.
The LogFile value is a pretty important one — it tells awstats where to find the log it's supposed to be analyzing. For our example, we'd change that value to the location of example.com's access log:
The LogFormat directive is probably not one you'll need to change, but I mention it in case you've got your domain logging in a custom format, or if you absolutely want to use another standard format like the common log format. The commented text that precedes this directive explains how you can tell awstats what your log format looks like.
There are also some predefined log formats. The default, "1", represents the combined log format. If you are using common log format you would use "4" here instead.
The SiteDomain value is where you tell awstats what your site's main domain name is. If we usually direct vistors to "www.example.com", we'd change this setting to:
HostAliases="localhost 127.0.0.1 REGEX[myserver\.com$]"
The HostAliases setting tells awstats all the different domains people might use to visit your site. It's there so it can separate external referring sites from internal links.
The default has some funky "REGEX" stuff in there — that describes a "regular expression", which is a flexible but daunting way to describe a search filter. That one above just checks for "myserver.com", for instance. We're not going to keep that. Regular expressions aren't simple (but they are useful, if you know how to use them).
All you really need for this setting is a list of domains. It's good to include "localhost" and "127.0.0.1" in there, and to throw in your main domain name and any alternates you have for the site separated by spaces:
HostAliases="www.example.com example.com www.olddomain.com olddomain.com localhost 127.0.0.1"
The DNSLookup setting is actually not one you'll want to change. At its current setting it means that awstats won't do DNS lookups on visitors' IP addresses. What awstats might glean from that information is what country the vistor is in. This can be nice to know and chart, but not nice enough for the amount of time and effort it can take awstats to make all those DNS queries.
If you really want country data for visitors, check the awstats plugin site for a couple alternatives to DNS lookup. They have an impact on awstats performance as well, but not as much as straight DNS lookups.
Remember that data directory you made? This is where you tell awstats what it was. If you didn't use "/var/lib/awstats", be sure and change this value to point to your data directory.
When the generated reports reference images they do so using this value. That directory is relative to the location of the reports. In this case, it will point to the "awstatsicons" directory we made by copying the default images directory. If you want to rename that directory you'll need to change it here so the generated reports can find the images.
And now you should have a working awstats installation. You just haven't tested it yet. We'll do that in the next article in this series.
To recap, the basic steps we went through were:
• Install awstats
• Create the reports and data directories
• Copy the icons directory into the reports directory
• Copy the config template to the config directory
• Customize the config file for the domain
Just follow those steps for any additional domains you want to track and you'll be set. You can share the data and report directories across multiple virtual hosts, but you'll have to use different logs and config files for each site.
In the next article we look at actually building our reports, viewing them, and scheduling awstats to generate reports on a regular basis.
- -- Jered