Related Items  

Linux and Windows Support  

make-it-great-again

Login or Register  

If you wish to contribute an article, please login or register.

Parsing Apache access log files using PHP

PrintE-mail

php-logoThis is a bit dated, but I still come back to it. A small script (using regex) that parses apache log files. The data breakdown required:

IP ADDRESS - -
Server Date / Time [SPACE]
"GET /path/to/page
HTTP/Type Request"
Success Code
Bytes Sent To Client
Referer
Client Software

Here's the code that does all the legwork:

<?php
$ac_arr = file('/path/to/copy/access_log');
$astring = join("", $ac_arr);
$astring = preg_replace("/(\n|\r|\t)/", "", $astring);

$records = preg_split("/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/", $astring, -1, PREG_SPLIT_DELIM_CAPTURE);
$sizerecs = sizeof($records);

// now split into records
$i = 1;
$each_rec = 0;
while($i<$sizerecs) {
  $ip = $records[$i];
  $all = $records[$i+1];
  // parse other fields
  preg_match("/\[(.+)\]/", $all, $match);
  $access_time = $match[1];
  $all = str_replace($match[1], "", $all);
  preg_match("/\"[A-Z]{3,7} (.[^\"]+)/", $all, $match);
  $http = $match[1];
  $link = explode(" ", $http);
  $all = str_replace("\"[A-Z]{3,7} $match[1]\"", "", $all);
  preg_match("/([0-9]{3})/", $all, $match);
  $success_code = $match[1];
  $all = str_replace($match[1], "", $all);
  preg_match("/\"(.[^\"]+)/", $all, $match);
  $ref = $match[1];
  $all = str_replace("\"$match[1]\"", "", $all);
  preg_match("/\"(.[^\"]+)/", $all, $match);
  $browser = $match[1];
  $all = str_replace("\"$match[1]\"", "", $all);
  preg_match("/([0-9]+\b)/", $all, $match);
  $bytes = $match[1];
  $all = str_replace($match[1], "", $all);
  print("<br>IP: $ip<br>Access Time: $access_time<br>Page: $link[0]<br>Type: $link[1]<br>Success Code: $success_code<br>Bytes Transferred: $bytes<br>Referer: $ref <br>Browser: $browser<hr>");

  // advance to next record
  $i = $i + 2;
  $each_rec++;
}
?>

Once the info is parsed into data chunks, it can next be written into a more friendly database import format using comma delimited, pipe delimited, tab delimited, etc.

$new_format[$each_rec] = "$ip\t$access_time\t$link[0]\t$link[1]\t$success_code\t$bytes\t$ref\t$browser";

Now for creating a new file that is ready for importing into MySQL:


$fhandle = fopen("/path/to/import_file.txt", "w") {
  foreach($new_format as $data)  {
  fputs($fhandle, "$data\n");
}
  fclose($fhandle);
}

 

Comments (3)
getting wrong value for status code
3 Wednesday, 10 September 2014 04:38
SR
66.249.65.6 - - [02/Jul/2014:06:25:36 +0000] "GET /browse/jobs/human-resources/all/all?contract=permanent&salary_range%5Bmax%5D=400 HTTP/1.0" 301 26 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" vhost=mpagesiglt.prod.acquia-sites.com host=wwwlt.michaelpage.com.sg hosting_site=mpagesiglt pid=15300 request_time=2319996



I have the following logs but I am getting different value for status code.Can you suggest other regex for the above mention log.Also after repetitive parsing its stop parsing incorrectly.Can you suggest other alternative.
re: testing component
2 Tuesday, 06 March 2012 09:09
Ant
Thanks - it works well.
testing component
1 Tuesday, 06 March 2012 09:07
dany
just testing the comment component :-)

Add your comment

Your name:
Subject:
Comment:
  The word for verification. Lowercase letters only with no spaces.
Word verification:
yvComment v.1.24.0
   
Copyright © 1999 - 2017 Virtual Helpme | t | Original Template: Allrounder