Hello, please sign in or register
You are here: Home

Apache Logs into Database

I've been asked to remove direct logging upon each page request into the database. In favour or tweaking apache's CustomLog format to create a file which can load into our database for generating reports. Whichever your take is on this here is a howto.

Add to your php code apache_note()

Add variables to your Apache logs. Replace "1234" with the current users ID for example.

<?php
/**
 * Create a file like this and append it to every script, 
 * Replace the value below of "1234" to your members USERID
 */

apache_note( 'userid', 1234 );
apache_note( 'sessionid', session_id() );
?>

Define the LogFormat in Apache configuration

The LogFormat line which is written in apache configuration file interprets the arguments in <?php apache_note(parameter,value); ?> as %{parameter}n

Our Apache httpd.conf includes the following lines.

SetEnvIf Request_URI "\.(gif|png|jpg|css|ico|js|GIF|JS|JPG)$" dontlog
SetEnvIf Request_URI "health_check\.php" dontlog

LogFormat "%h,%{%Y-%m-%d %H:%M:%S}t,\"%r\",%>s,%b,%{userid}n,%D,%{sessid}n,\"%{SCRIPT_NAME}e\",\"%{User-agent}i\"" common

CustomLog "|/usr/local/sbin/cronolog -e /data/weblogs/sites/access_log_%Y%m%d" common env=!dontlog

The first SetEnvIf specifies a boolean value to ignore images and requests to health_check.php which is used by our load balancer. "common env=!dontlog" on the last line will omit logging the aforementioned.

In this example i've seperated each attribute with a comma, efficaciously creating a CSV file. 

 LogFormat  Description
%h Client IP Address
%{%Y-%m-%d %H:%M:%S}t Timestamp
%r Request
%>s Last Status Code
%b Bytes sent
%{userid}n PHP: userid
%D MicroSeconds to server request (10 ^ -6 seconds)
%{sessid}n PHP: session_id
%{SCRIPT_NAME}e script name
%{User-agent}i clients browser

Create Database Table

CREATE TABLE `tb_apache_log` (
  `var_ip` varchar(15) NOT NULL default '',
  `tms_stamp` timestamp NOT NULL default '0000-00-00 00:00:00',
  `var_request` varchar(255) NOT NULL default '',
  `enu_http_status` enum('200','206','301','302','304','400','401','403','404','405','501') NOT NULL default '200',
  `int_bytes` int(10) unsigned NOT NULL default '0',
  `int_user_id` smallint(5) unsigned NOT NULL default '0',
  `tts_microseconds` int(10) unsigned NOT NULL default '0',
  `var_session_id` varchar(32) NOT NULL default '',
  `var_script` varchar(60) NOT NULL default '',
  `var_user_agent` varchar(100) NOT NULL default '',
  `int_id` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  USING BTREE (`int_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Apache log files'

LOAD DATA with the logfiles on-demand

There are two ways to do this in order to maintain uniqueness. Import the truncate Or compare the current with previous files and import the difference. The first is simple and prefereable if the files accessed are on the same server. The second leaves all the files intact but compares the difference and creates additional files with the difference.

Import and Truncate

Prefered if the local machine is loading the data too.

#rename access file, Apache will automatically recreate it.
mv access_log.csv tb_apache_log.csv
# Load file
mysqlimport db_stats tb_apache_log.csv
# rm -rf tb_apache_log.csv

Compare and Import the difference

This is an alternative which compares the difference between a file which has already been loaded. And keeps the original logs in the right format. This is prefered if we dont have write access to the original files. 
#! /bin/bash

# Environment variables
DIRLOGS='/data/weblogs/'
FILELOG='/data/weblogs/file_log'

SQLHOST="localhost"
SQLUSER="root"
SQLPASS="root_pass"

TABLELOG="db_stats.tb_apache_log"

#########################################
# STEP 1 -      COPY WEBLOG FILES
#
#       -r      recursive
#       -u      update
#       -b      backup
#       -t      copy modification time
#       --suffix Add the siffix to the backup file the default with the -b parameter is ~ //  --suffix=:`date +\%H\%M`
###

sudo -u syncro rsync -rubt -v syncro@web1.server.com:/data/weblogs/sites/access* ${DIRLOGS}web1/
sudo -u syncro rsync -rubt -v syncro@web2.server.com:/data/weblogs/sites/access* ${DIRLOGS}web2/
# As many as you want... ######################################### # STEP 2 - Tidy Files and directory # loop through the files which are not in the process list # echo files that are backups echo `find $DIRLOGS -type f -regex .*/access.*~` for FILE in `find $DIRLOGS -type f -regex .*/access.*~` do # Get the name of the original file, i.e. it wont have the suffix ~ lngth=${#FILE} #Determine the length of the string lngth=$[lngth - 1] #Subtract one from the length of the string FILEORIG=${FILE:0:$lngth} #substring 0 to new position # Get the updated files linecount LASTPOS=`wc -l $FILEORIG` LASTPOS=${LASTPOS% *} # Get the old files line count STARTPOS=`wc -l $FILE` STARTPOS=${STARTPOS% *} # Whats the difference let ROWDIFF=$LASTPOS-$STARTPOS # Create the difference file, using the current time to make a new file new_file=${FILEORIG}:`date +\%H%M` echo "tail -$ROWDIFF $FILEORIG > $new_file" tail -$ROWDIFF $FILEORIG > $new_file # delete the old (backup) file rm $FILE -f done ######################################### # STEP 3 - Walk through the files in the apache logs directory and Insert into apache_log table. # files_omit=(`less $FILELOG`) function search_array() { index=0 while [ "$index" -lt "${#files_omit[@]}" ]; do if [ "${files_omit[$index]}" = "$1" ]; then echo $index return fi let "index++" done echo "" } for FILE in `find $DIRLOGS -type f -regex .*/access.*` do # Is the file yet to be loaded? index=$(search_array $FILE) if [ -z "$index" ]; then # Load file into MySQL echo -e "Uploading \"$FILE\"" mysql -u$SQLUSER -p$SQLPASS -h$SQLHOST -e "LOAD DATA LOW_PRIORITY INFILE '$FILE' INTO TABLE $TABLELOG FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'" echo -e "Uploaded \"$FILE\"" # Append the filename to the files_to_omit echo " $FILE" >> $FILELOG fi done

References

http://httpd.apache.org/docs/2.0/mod/mod_log_config.html#formats

http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html

 

Comments

Title*
Comment

Prove you are not a robot

To prove you are not a robot, please type in the six character code you see in the picture below
Security confirmation codeI can't see this!
Contact
Name*
Email never shown*
Home Page

Author

Andrew Dodson
Since:Feb 2007

Comment | flag

Categories

Bookmark and Share