Splitting Apache access logs and feeding the result to Geolizer
I feel like sharing a script of mine. A script I wrote to split log files holding data of several sub domains into single files, and generate a separate report for each sub domain out of that. The script is used on a DomainFactory host so if you have an account at DomainFactory this might come in handy. The script is written in Bash, is intended to be run as a cron job and does the following:
- Watch the logs directory for unprocessed files
- When a new (finished) log file appears split it per sub domain
- Feed the split log files to Geolizer incrementally. Geolizer is a tuned version of Webalizer, availabe as a patch.
That's mainly it but it took quite some time to make it run smoothly. DomainFactory copies log files in two steps: the first 6 hours on day k , and the last 18 on day k + 1 , both a few minutes before 06:00. So setting up the cron job for about 06:30 should be optimal. To not work with a log file that only holds the first 6 hours of the day the date of the file's last access is checked. The script is Public Domain but you still owe me bug reports if you use it :-D