Splitting Apache access logs and feeding the result to Geolizer 2007-12-31
I feel like sharing a script of mine. A script I wrote to split log files holding data of several sub domains into single files, and generate a separate report for each sub domain out of that. The script is used on a DomainFactory host so if you have an account at DomainFactory this might come in handy. The script is written in Bash, is intended to be run as a cron job and does the following:
- Watch the logs directory for unprocessed files
- When a new (finished) log file appears split it per sub domain
- Feed the split log files to Geolizer incrementally. Geolizer is a tuned version of Webalizer, availabe as a patch.
That’s mainly it but it took quite some time to make it run smoothly. DomainFactory copies log files in two steps: the first 6 hours on day k, and the last 18 on day k + 1, both a few minutes before 06:00. So setting up the cron job for about 06:30 should be optimal. To not work with a log file that only holds the first 6 hours of the day the date of the file’s last access is checked.
The script is Public Domain but you still owe me bug reports if you use it

This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-No Derivative Works 3.0 Germany License.


Leave a Reply