Fork me on GitHub

Transform a pcap to an apache log file v0.1

Haka can extract data from pcap; thanks to its http dissector, it can parse all usefull data. One interesting use of this is to recreate apache logs from a pcap file. Imagine you have to treat data from a webserver. There is plenty of tools to analyze data from an apache log file, but what if you have only pcap files?

With haka its just a matter of few lines of code to extract all information from a pcap to a standard apache log file, called "combined log format". The format is quite self explanatory:

pache combined log format:
(SRC IP) - - [(DATE)] (REQUEST) (RESPONSE STATUS) (SIZE) "(REFERER)" "(USERAGENT)"

And the haka script sets a hook on the response, because log file mix information from the HTTP response and the HTTP request. It extracts all relevant information, concatenates it to a variable. Finally, it prints it on the output.

--------------------------
-- Loading dissectors
--------------------------

require('protocol/ipv4')
require('protocol/tcp')
require('protocol/http')

--------------------------
-- Setting next dissector
--------------------------
haka.rule{
    hooks = { 'tcp-connection-new' },
    eval = function(self, pkt)
        local tcp = pkt.tcp
        if tcp.dstport == 80 then
            pkt.next_dissector = "http"
        end
    end
}

--------------------------
-- Printing http info
--------------------------
haka.rule{
    hooks = { 'http-response' },
    eval = function (self, http)
        --Apache combined log format:
        -- (SRC IP) - - [(DATE)] (REQUEST) (RESPONSE STATUS) (SIZE) "(REFERER)" "(USERAGENT)"
        local tbl_log = {}
        local ref
        table.insert(tbl_log, tostring(http.connection.srcip))
        table.insert(tbl_log, " - - [1/Jan/2000:00:00:00 +0000] ")
        table.insert(tbl_log, "\"")
        table.insert(tbl_log, http.request.method)
        table.insert(tbl_log, " ")
        table.insert(tbl_log, http.request.uri)
        table.insert(tbl_log, " ")
        table.insert(tbl_log, http.request.version)
        table.insert(tbl_log, "\" ")
        table.insert(tbl_log, http.response.status)
        table.insert(tbl_log, " ")
        table.insert(tbl_log, http.response.data:available())
        if http.request.headers["referer"] == nil then
            ref = " -"
        else
            ref = "\"" .. http.request.headers["referer"] .. "\""
        end
        table.insert(tbl_log, ref)
        table.insert(tbl_log, "\"")
        table.insert(tbl_log, http.request.headers["User-Agent"])
        table.insert(tbl_log, "\"")
        print(table.concat(tbl_log))
    end
}

This script will produce a typical apache log file that you can feed to any tool you like in order to make an analysis. Currently, you can't get the date of the request, so we use a fixed one (with the next release, you will be able to access the timestamp of any packet from the response hook, and set the date accordingly).

If you want to test this tool, you can use a pre-processed pcap file originated from the DARPA dataset and which could be retrieved from the MIT website. Or, for a more reasonable size, you can use a filtered version that you can download from the Haka website in the Resources section.