Related Items  

Linux and Windows IT Support  

We make IT Support easy!

Windows, Apple and Linux IT support and services.

Login or Register  

If you wish to contribute an article, please login or register.

Parsing Apache access log files using PHP

PrintE-mail

php-logoThis is a bit dated, but I still come back to it. A small script (using regex) that parses apache log files. The data breakdown required:

IP ADDRESS - -
Server Date / Time [SPACE]
"GET /path/to/page
HTTP/Type Request"
Success Code
Bytes Sent To Client
Referer
Client Software

Here's the code that does all the legwork:

<?php
$ac_arr = file('/path/to/copy/access_log');
$astring = join("", $ac_arr);
$astring = preg_replace("/(\n|\r|\t)/", "", $astring);

$records = preg_split("/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/", $astring, -1, PREG_SPLIT_DELIM_CAPTURE);
$sizerecs = sizeof($records);

// now split into records
$i = 1;
$each_rec = 0;
while($i<$sizerecs) {
  $ip = $records[$i];
  $all = $records[$i+1];
  // parse other fields
  preg_match("/\[(.+)\]/", $all, $match);
  $access_time = $match[1];
  $all = str_replace($match[1], "", $all);
  preg_match("/\"[A-Z]{3,7} (.[^\"]+)/", $all, $match);
  $http = $match[1];
  $link = explode(" ", $http);
  $all = str_replace("\"[A-Z]{3,7} $match[1]\"", "", $all);
  preg_match("/([0-9]{3})/", $all, $match);
  $success_code = $match[1];
  $all = str_replace($match[1], "", $all);
  preg_match("/\"(.[^\"]+)/", $all, $match);
  $ref = $match[1];
  $all = str_replace("\"$match[1]\"", "", $all);
  preg_match("/\"(.[^\"]+)/", $all, $match);
  $browser = $match[1];
  $all = str_replace("\"$match[1]\"", "", $all);
  preg_match("/([0-9]+\b)/", $all, $match);
  $bytes = $match[1];
  $all = str_replace($match[1], "", $all);
  print("<br>IP: $ip<br>Access Time: $access_time<br>Page: $link[0]<br>Type: $link[1]<br>Success Code: $success_code<br>Bytes Transferred: $bytes<br>Referer: $ref <br>Browser: $browser<hr>");

  // advance to next record
  $i = $i + 2;
  $each_rec++;
}
?>

Once the info is parsed into data chunks, it can next be written into a more friendly database import format using comma delimited, pipe delimited, tab delimited, etc.

$new_format[$each_rec] = "$ip\t$access_time\t$link[0]\t$link[1]\t$success_code\t$bytes\t$ref\t$browser";

Now for creating a new file that is ready for importing into MySQL:


$fhandle = fopen("/path/to/import_file.txt", "w") {
  foreach($new_format as $data)  {
  fputs($fhandle, "$data\n");
}
  fclose($fhandle);
}

 

Comments (3)
getting wrong value for status code
3 Wednesday, 10 September 2014 04:38
SR
66.249.65.6 - - [02/Jul/2014:06:25:36 +0000] "GET /browse/jobs/human-resources/all/all?contract=permanent&salary_range%5Bmax%5D=400 HTTP/1.0" 301 26 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" vhost=mpagesiglt.prod.acquia-sites.com host=wwwlt.michaelpage.com.sg hosting_site=mpagesiglt pid=15300 request_time=2319996



I have the following logs but I am getting different value for status code.Can you suggest other regex for the above mention log.Also after repetitive parsing its stop parsing incorrectly.Can you suggest other alternative.
re: testing component
2 Tuesday, 06 March 2012 09:09
Ant
Thanks - it works well.
testing component
1 Tuesday, 06 March 2012 09:07
dany
just testing the comment component :-)

Add your comment

Your name:
Subject:
Comment:
  The word for verification. Lowercase letters only with no spaces.
Word verification:
yvComment v.1.24.0
   
Copyright © 1999 - 2017 Virtual Helpme | IT Services Catered to Your Business | Original Template: Allrounder
-->