As I cannot sleep since I got woken up by a loud burglar alarm nearby, I might as well post something new here. Recently I had the need to display hits on a web site grouped by HTTP status code, so to turn a quick few lines of code into something a bit more interesting, I decided to display each status code in the summary cross-linked to the relevant part of the HTTP 1.1 specification.
The spec looks like a docbook-generated HTML set of pages (just a guess) with sections and subsections each having a unique URL built following a common rule, so once you have a dict of status codes:section numbers, it's very easy to build the links:
HTTP_STATUS_CODES = { 100: ('Continue', '10.1.1'), 101: ('Switching Protocols', '10.1.2'), 200: ('OK', '10.2.1'), 201: ('Created', '10.2.2'), 202: ('Accepted', '10.2.3'), 203: ('Non-Authoritative Information', '10.2.4'), 204: ('No Content', '10.2.5'), 205: ('Reset Content', '10.2.6'), 206: ('Partial Content', '10.2.7'), 300: ('Multiple Choices', '10.3.1'), 301: ('Moved Permanently', '10.3.2'), 302: ('Found', '10.3.3'), 303: ('See Other', '10.3.4'), 304: ('Not Modified', '10.3.5'), 305: ('Use Proxy', '10.3.6'), 306: ('(Unused)', '10.3.7'), 307: ('Temporary Redirect', '10.3.8'), 400: ('Bad Request', '10.4.1'), 401: ('Unauthorized', '10.4.2'), 402: ('Payment Required', '10.4.3'), 403: ('Forbidden', '10.4.4'), 404: ('Not Found', '10.4.5'), 405: ('Method Not Allowed', '10.4.6'), 406: ('Not Acceptable', '10.4.7'), 407: ('Proxy Authentication Required', '10.4.8'), 408: ('Request Timeout', '10.4.9'), 409: ('Conflict', '10.4.10'), 410: ('Gone', '10.4.11'), 411: ('Length Required', '10.4.12'), 412: ('Precondition Failed', '10.4.13'), 413: ('Request Entity Too Large', '10.4.14'), 414: ('Request-URI Too Long', '10.4.15'), 415: ('Unsupported Media Type', '10.4.16'), 416: ('Requested Range Not Satisfiable', '10.4.17'), 417: ('Expectation Failed', '10.4.18'), 500: ('Internal Server Error', '10.5.1'), 501: ('Not Implemented', '10.5.2'), 502: ('Bad Gateway', '10.5.3'), 503: ('Service Unavailable', '10.5.4'), 504: ('Gateway Timeout', '10.5.5'), 505: ('HTTP Version Not Supported', '10.5.6')}def getHTTPStatusUrl(status_code): if not status_code in HTTP_STATUS_CODES: return None description, section = HTTP_STATUS_CODES[status_code] return '''<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec%s" >%s - %s</a>''' % (section, status_code, description)
To get and parse the server log files, and store them in MySQL I'm using a small app I wrote with a colleague at work in Python (what else?), which started as a niche project and may become a company standard. The app uses components for both the log getting (currently http:// and file://) and parsing (our apps custom logs, and apache combined), uses HTTP block transfers to get only new log records, and a simple algorithm to detect log rotations and corruptions. The log records may be manipulated before handing them off to the db so as to insert meaningful values (eg to flag critical errors, etc.). The app has a simple command line interface, uses no threading as it would have complicated development too much (we use the shell to get registered logs and run one instance for each log), and has no scheduler, as cron does the job very well. At work, it is parsing the logs of a critical app on a desktop pc, handling something like 2 million records in a few hours, and it does it so well we are looking into moving it to a bigger machine (maybe an IBM Regatta) and DB2 (our company standard), to parse most of our applications logs and hand off summaries to our net management app. It still needs some work, but we would like to release its code soon. If you're interested in trying it, drop me a note.