Parsing Apache access log files using PHP
This is a bit dated, but I still come back to it. A small script (using regex) that parses apache log files. The data breakdown required:
IP ADDRESS - -
Server Date / Time [SPACE]
"GET /path/to/page
HTTP/Type Request"
Success Code
Bytes Sent To Client
Referer
Client Software
Here's the code that does all the legwork:
<?php
$ac_arr = file('/path/to/copy/access_log');
$astring = join("", $ac_arr);
$astring = preg_replace("/(\n|\r|\t)/", "", $astring);
$records = preg_split("/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/", $astring, -1, PREG_SPLIT_DELIM_CAPTURE);
$sizerecs = sizeof($records);
// now split into records
$i = 1;
$each_rec = 0;
while($i<$sizerecs) {
$ip = $records[$i];
$all = $records[$i+1];
// parse other fields
preg_match("/\[(.+)\]/", $all, $match);
$access_time = $match[1];
$all = str_replace($match[1], "", $all);
preg_match("/\"[A-Z]{3,7} (.[^\"]+)/", $all, $match);
$http = $match[1];
$link = explode(" ", $http);
$all = str_replace("\"[A-Z]{3,7} $match[1]\"", "", $all);
preg_match("/([0-9]{3})/", $all, $match);
$success_code = $match[1];
$all = str_replace($match[1], "", $all);
preg_match("/\"(.[^\"]+)/", $all, $match);
$ref = $match[1];
$all = str_replace("\"$match[1]\"", "", $all);
preg_match("/\"(.[^\"]+)/", $all, $match);
$browser = $match[1];
$all = str_replace("\"$match[1]\"", "", $all);
preg_match("/([0-9]+\b)/", $all, $match);
$bytes = $match[1];
$all = str_replace($match[1], "", $all);
print("<br>IP: $ip<br>Access Time: $access_time<br>Page: $link[0]<br>Type: $link[1]<br>Success Code: $success_code<br>Bytes Transferred: $bytes<br>Referer: $ref <br>Browser: $browser<hr>");
// advance to next record
$i = $i + 2;
$each_rec++;
}
?>
Once the info is parsed into data chunks, it can next be written into a more friendly database import format using comma delimited, pipe delimited, tab delimited, etc.
$new_format[$each_rec] = "$ip\t$access_time\t$link[0]\t$link[1]\t$success_code\t$bytes\t$ref\t$browser";
Now for creating a new file that is ready for importing into MySQL:
$fhandle = fopen("/path/to/import_file.txt", "w") {
foreach($new_format as $data) {
fputs($fhandle, "$data\n");
}
fclose($fhandle);
}
http://malton.duckdns.org/log.php
http://malton.duckdns.org/apache.php
http://malton.duckdns.org/access.php
I have the following logs but I am getting different value for status code.Can you suggest other regex for the above mention log.Also after repetitive parsing its stop parsing incorrectly.Can you suggest other alternative.