Saturday, May 30, 2009

Replacing AWStats

Don't get me wrong; I've been a big fan of AWStats for years. It's great for giving me a high-level overview of how my sites are doing, traffic-wise. The day-by-day breakdown is nice, and if I don't mind looking at the big picture for my site a month at a time (which is usually good), then it's perfect. When I started working on a more complex setup than just a few low-traffic blogs, then it started to show its limitations. Let me tell you what I mean.

In our current setup at work (or at least the one that we've been moving toward), we have three web servers. Each server is behind a load balancer, and serves exactly the same sites. When I started monitoring, we had 200+ sites active. We wanted to have stats for each domain on each server, plus the stats for each domain across all servers, plus the stats for all domains on each individual server.

AWStats requires one separate config file per domain, per server, plus another separate config file per domain for all servers, plus a separate config for the server-wide stats. That means we're looking at 800+ separate config files. I wrote a script to automatically generate all of these for me, but they're still a pain to keep track of.

Before I parse out the log files, I need to combine multiple log files into massive composite log files (domain-wide and server-wide), in chronological order so that AWStats doesn't choke. As you can imagine, this requires a considerable amount of resources.

AWStats also maintains a flat data file per config file per month, all in the same directory. Assuming we keep just a year's worth of data, we're looking at 9600+ separate files in that directory. At this point, one wonders why we can't just store everything in a database.

Now that you've seen the kind of management nightmare that I have to deal with, with just the features that AWStats does have, I think you can see where my frustrations begin. But I can't be content just being unhappy with existing features; I want new features too. And anybody that's ever looked at the main file knows how much of a beast it is to figure out just what's going on, much less change anything.

I ended up adding some glue of my own to make things a little easier. Remember, not only do I have hundreds of config files and thousands of data files, I also have to pull them up in the first place. As it turns out, the script that I wrote to automate building config files also builds an HTML file with links to all of the AWStats pages in it. I even added a quick ping script to periodically hit each domain, and place a colored dot next to the domain name to indicate its active status (green is good, red is bad, blue and yellow are proprietary indicators that a site responded, but not in a normal manner). I even have other indicators set up to tell me things like whether a site has an SSL cert, today's hits so far, yesterday's hits, green up arrows to tell me if today's hits are higher then yesterdays, red down arrows if they are lower, yellow right arrows if the traffic is the same, you get the idea.

Maybe I'm just selfish, but I want a stats program that can handle all of this gracefully, and for not a lot of money. Free (as in freedom, and beer) is ideal. But since I ultimately decided to write my own, it certainly wasn't free as in time. But it was fun, and I learned a lot about AWStats while I was at it.

Incidentally, it turns out AWStats does support clusters kind of like our load balancer setup, but I didn't find that out until I spent a lot of time in the various parts of it. And I haven't taken the time to figure out how they do it; I already had a solution in mind anyway.

I hope this post gives you an idea of why I started thinking about replacing the great AWStats, I program which I still love and respect. If you're interested in looking at my code so far, I have packed it up for public consumption. Information about its operation and shortcomings are in the README.


No comments:

Post a Comment

Comments for posts over 14 days are moderated