Ludoo
static bits and pieces

My brother noticed a strange referrer for this page in his logs today (split over a few lines for convenience):

http://adtools.corp.google.com/cgi-bin/annotatedquery/annotatedquery.py?q=SMSMAIL
&host=www.google.com&hl=it&gl=IT&ip=&customer_id=&decodeallads=1&ie=UTF-8
&oe=utf8&btnG=Go Annotate&sa=D&deb=a&safe=active&btnG=Go Annotate

Maybe it's pretty common, and it's just that living at the periphery of the Empire we're not too interesting for Google and never see this kind of referrers in our logs. But I'm curious anyway, and I like the fact it's obviously a Python script.

The few bits of information that can be gathered from the URL are:

My guess is a tool to aid in fine-tuning ad placement for Google's paying customers, where they can perform a query on a set of keywords, see on which of the result pages their ad would be placed, and annotate the placement for Google's staff.

Anybody knows better?

ludo ~ Sep 30, 2004 17:12:00 ~ upd Sep 30, 2004 18:17:10 ~ category Python

As I cannot sleep since I got woken up by a loud burglar alarm nearby, I might as well post something new here. Recently I had the need to display hits on a web site grouped by HTTP status code, so to turn a quick few lines of code into something a bit more interesting, I decided to display each status code in the summary cross-linked to the relevant part of the HTTP 1.1 specification.

The spec looks like a docbook-generated HTML set of pages (just a guess) with sections and subsections each having a unique URL built following a common rule, so once you have a dict of status codes:section numbers, it's very easy to build the links:

HTTP_STATUS_CODES = {
    100: ('Continue', '10.1.1'),
    101: ('Switching Protocols', '10.1.2'),
    200: ('OK', '10.2.1'),
    201: ('Created', '10.2.2'),
    202: ('Accepted', '10.2.3'),
    203: ('Non-Authoritative Information', '10.2.4'),
    204: ('No Content', '10.2.5'),
    205: ('Reset Content', '10.2.6'),
    206: ('Partial Content', '10.2.7'),
    300: ('Multiple Choices', '10.3.1'),
    301: ('Moved Permanently', '10.3.2'),
    302: ('Found', '10.3.3'),
    303: ('See Other', '10.3.4'),
    304: ('Not Modified', '10.3.5'),
    305: ('Use Proxy', '10.3.6'),
    306: ('(Unused)', '10.3.7'),
    307: ('Temporary Redirect', '10.3.8'),
    400: ('Bad Request', '10.4.1'),
    401: ('Unauthorized', '10.4.2'),
    402: ('Payment Required', '10.4.3'),
    403: ('Forbidden', '10.4.4'),
    404: ('Not Found', '10.4.5'),
    405: ('Method Not Allowed', '10.4.6'),
    406: ('Not Acceptable', '10.4.7'),
    407: ('Proxy Authentication Required', '10.4.8'),
    408: ('Request Timeout', '10.4.9'),
    409: ('Conflict', '10.4.10'),
    410: ('Gone', '10.4.11'),
    411: ('Length Required', '10.4.12'),
    412: ('Precondition Failed', '10.4.13'),
    413: ('Request Entity Too Large', '10.4.14'),
    414: ('Request-URI Too Long', '10.4.15'),
    415: ('Unsupported Media Type', '10.4.16'),
    416: ('Requested Range Not Satisfiable', '10.4.17'),
    417: ('Expectation Failed', '10.4.18'),
    500: ('Internal Server Error', '10.5.1'),
    501: ('Not Implemented', '10.5.2'),
    502: ('Bad Gateway', '10.5.3'),
    503: ('Service Unavailable', '10.5.4'),
    504: ('Gateway Timeout', '10.5.5'),
    505: ('HTTP Version Not Supported', '10.5.6')}
def getHTTPStatusUrl(status_code):
    if not status_code in HTTP_STATUS_CODES:
        return None
    description, section = HTTP_STATUS_CODES[status_code]
    return '''<a
        href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec%s"
        >%s - %s</a>''' % (section, status_code, description)

To get and parse the server log files, and store them in MySQL I'm using a small app I wrote with a colleague at work in Python (what else?), which started as a niche project and may become a company standard. The app uses components for both the log getting (currently http:// and file://) and parsing (our apps custom logs, and apache combined), uses HTTP block transfers to get only new log records, and a simple algorithm to detect log rotations and corruptions. The log records may be manipulated before handing them off to the db so as to insert meaningful values (eg to flag critical errors, etc.). The app has a simple command line interface, uses no threading as it would have complicated development too much (we use the shell to get registered logs and run one instance for each log), and has no scheduler, as cron does the job very well. At work, it is parsing the logs of a critical app on a desktop pc, handling something like 2 million records in a few hours, and it does it so well we are looking into moving it to a bigger machine (maybe an IBM Regatta) and DB2 (our company standard), to parse most of our applications logs and hand off summaries to our net management app. It still needs some work, but we would like to release its code soon. If you're interested in trying it, drop me a note.

ludo ~ Sep 15, 2004 02:19:00 ~ upd Sep 15, 2004 02:56:10 ~ category Python

I'm on the train to Switzerland for a job interview (no I'm not unemployed, I have a good well-paid -- by Italian standards -- pretty comfortable job with a huge solid company, but I'd like to go work abroad again), reading stuff I dumped this morning at home before leaving on my newly resurrected iPAQ*. Apart from Danah Boyd's Social Software, Sci-Fi and Mental Illness which was mentioned on Joel's latest entry, the only other interesting thing to read was Russ's sudden discovery of Perl.

I was a bit surprised to read that a Java developer with some exposure to Python (mainly due to his obsession with Series 60 phones, which sooner or later will get Python as their default scripting language) can suddenly discover Perl and even like it. I was more surprised when I read that a good part of Russ's sudden love for Perl is due to "Python's horrible documentation.[..] Python's docs were just half-ass and bewildering to me". . Horrible? How's that?

I've used my share of languages these past 10 years (and Perl was the first I loved), and I've never found anything as easy to learn and use as Python. The docs are a bit concise but well written, and they cover the base language features and all the standard library modules with a well-organized layout. And if the standard docs are not enough, you can always go search comp.lang.python, read PEPs on specific topics, read AMK's "What's new in Python x.x" for the current and older versions of Python, keep the Python Quick Reference open, or buy the Python Cookbook (which may give you the right to ask Alex about some obscure Python feature for the 1000th time on clp or iclp and get an extra 20 lines in his replies). And as for examples, you don't see many of them in the docs because Python has an interactive interpreter, and very good built-in introspection capabilities. So usually when you have to deal with a new module you skim the docs then fire up the interpreter, import the module, and start poking around to see how to use it. And while you're at the interpreter console, remember that dir(something) and help(something) are your friends. Maybe the Perl and Perl modules docs are so good (are they, really? I can't remember) because you would be utterly lost in noise without them, as Russ seems to notice in dealing with special variables, something that still gives me nightmares from time to time.

So I don't really get Russ's sudden love for Perl, nor am I overexcited by learning that he got invited to Foo Camp, as I'm usually not much into the "look how important/well known/an alpha geek I am" stuff (probably due to my not being important, or well known, and definitely not an alpha anything, apart for my dogs).

Rereading this entry after having posted it, and before I forget about Perl for next (hopefully many) years, I have to say that if despite everything you're really into Perl you should have a look at Damian Conway's excellent Object Oriented Perl, one of the best programming books I have read.

BTW, I live 50 kms away but I always forget how green Switzerland is. Oh, and I should remember to book a place in a non-smoking car next time, so as to lower my chances of sitting next to an oldy lady smoking Gauloises, exploding in bursts of loud coughing every odd minute, and traveling with super-extra-heavy suitcases she cannot move half an inch, let alone raise over the seats.

update This morning I found a reply by Russ in my inbox. I'm not an expert in the nuances of written English, but he did not sound happy with my post. I guess the adrenaline rush of going to a long interview in three different languages in a foreign country (not to say being a bit envious of work conditions overseas) made me come out harsher than I wanted to. Or maybe I just like pissing people off when I have nothing better to do, and they trumpet false opinions to the whole world (something I... uhm... never did). Anyway, it turns out Russ could not find the Python Library Reference, to learn how to use the mailbox module to parse a Unix mailbox. I guess sometimes it just pays to look twice, especially if not looking involves Perl.

As usual on my site, comments are disabled. I'm too lazy to add them to my homegrown blogging thing, and I have no time for dealing with comment spam. If you feel like you have something worthwile to say, reply to Russ's post on his blog, email me, or just get busy on something else for a few minutes then try to remember what you wanted to say, maybe this is not such an interesting topic as you thought.

update October 18, 2004 With all the excitement on podcasts these past weeks, which involved starting a new site and doing my own podcasts for Qix.it, I forgot to mention an email from Pete Prodoehl about the comments I make on US versus European work conditions and salaries in the footnote below. Pete writes:

Ludo, I certainly don't have the disposable income that many US bloggers seem to have. I know what you mean though. I constantly seem to come across posts where people say they handed down their old 20gig iPod because the got a new 40gig iPod.
Honestly, it all sounds crazy to me... Then again, I live in the US Midwest, maybe things are different here as well. ;)
Thanks Pete, it's nice to learn that not all Americans live in Eden (I should know that myself, having lived there for a year, but it was a long time ago and things might have changed).

* I don't have the money to buy a new PDA (how do you people in the US manage to buy so many gadgets and crap anyway? don't you have to pay rent/bills/taxes/etc? aren't you hit by the recession or is it something only we Italians have to live with?), so I resurrected my iPAQ 3850 which had been lying around thinking itself a brick for ages, got a new battery off ebay, wiped out that sorry excuse for an operating system which is PocketPC, installed OPIE/Opie Reader/ JpluckX and soon had a working PDA again. I did not remember how much I missed having one, a smartphone is not the same thing. Now if only I had the money to buy a Zaurus SL-C860... back

ludo ~ Sep 07, 2004 09:33:00 ~ upd Oct 18, 2004 15:02:20 ~ category Python

Well, not exactly..... SMIME is a very nice spec, it's widely supported, and there are SMIME libraries for most development environments. The problem is that the RFC822 SMTP-vs-local line- endings nastyness combines with MIME canonicalization and libraries/mail UAs idiosyncracies to make SMIME messages very fragile. In the rest of this entry, I briefly describe a couple of SMIME pitfalls I spent quite a few hours debugging recently.

If you want to experiment with SMIME signing, you can download the M2Crypto- based SMIME signing class which is the companion to the verifying class of my previous entry on SMIME.

line endings

Using OpenSSL (via M2Crypto) to cleartext sign a SMIME message, you get back a valid multipart/signed message that has one problem: the cleartext part of the message has CR+LF line endings, while the rest of the message (pkcs7 signature, SMIME headers) has LF line endings (cf this thread on mailing.openssl.dev). OpenSSL performs MIME canonicalization (ie it converts line endings to CR+LF -- SMIME_crlf_copy() in crypto/pkcs7/pk7_smime.c/PKCS7_sign()) on the message before signing it, as per the SMIME spec. The problem is that a message with a mix of CR+LF and bare LF is almost never what you need. If you send the message by SMTP directly after signing it, it should have CR+LF line endings as per RFC822. If you hand off the message to a program like qmail-inject, your message should respect local conventions, ie on Unix have bare LF line endings. This problem becomes apparent when you open a signed message in Outlook or Outlook Express, since the message appears tampered and cannot be verified. If you sign with the "binary" flag, you get no CR+LF line endings but the resulting message cannot be verified unless you perform the MIME canonicalization yourself before signing, getting the same output as in the above situation. A sample of the resulting message, with line endings prefixed to the actual lines (and long lines snipped):

'\n'    MIME-Version: 1.0
'\n'    Content-Type: multipart/signed; protocol="application/x-pkcs7-sig
'\n'
'\n'    This is an S/MIME signed message
'\n'
'\n'    ------526F05E052FA5F1DF695C4ABA3E3EF81
'\r\n'  prova prova prova
'\r\n'   prova 123
'\r\n'
'\r\n'  prova
'\n'
'\n'    ------526F05E052FA5F1DF695C4ABA3E3EF81
'\n'    Content-Type: application/x-pkcs7-signature; name="smime.p7s"
'\n'    Content-Transfer-Encoding: base64
'\n'    Content-Disposition: attachment; filename="smime.p7s"
'\n'
'\n'    MIIGDAYJKoZIhvcNAQcCoIIF/TCCBfkCAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3
'\n'    DQEHAaCCA9gwggPUMIIDPaADAgECAgECMA0GCSqGSIb3DQEBBAUAMIGbMQswCQYD

outlook interoperability

If you sign SMIME messages with OpenSSL on Unix, you may discover that your messages are valid in Mozilla, but they appear as tampered in Outlook, Outlook Express, and programs using MS libraries to validate them. A search on Google Groups turns up quite a few threads on this topic, none of which unfortunately offer any practical help, apart from generic suggestions regarding line-ending conversions, which do not work. After quite a few hours spent debugging this problem, I could only come up with a practical solution with no theorical explanation: append a single linefeed (a bare LF) to the end of the payload before signing it. Mozilla and OpenSSL keep verifying the resulting SMIME messages, and Outlook stops complaining that the message has been tampered with. I still have to verify the implications of this workaround when you sign a message that includes a SMIME message as a MIME message/rfc822 attachment, since I've noticed that signing such a message often breaks the attachment validity.

ludo ~ Jun 24, 2004 13:55:00 ~ upd Jun 24, 2004 15:16:39 ~ category Python

In my spare time, I'm working on a project where I have to sign and verify SMIME mail using M2Crypto which works quite well, but lacks a bit in documentation especially on SMIME functions. The Programming S/MIME in Python with M2Crypto howto is enough to point you in the right direction, and the source has a few SMIME examples. What is missing is a recipe to verify signed SMIME messages if you don't have the signer's certificate, which is what usually happens when you have to verify Internet email.

Openssl's smime command is able to do that, so there should be a way to accomplish the same thing from Python using M2Crypto. After a bit of fiddling around and looking at openssl's source, I have found out a way which seems to work (update: content check done against the output of SMIME.smime_load_pkcs7_bio instead of using email.message_from_string, return a list of certificates on succesful verification, show content diff if verification fails):

#!/usr/bin/python
"""
Simple class to verify SMIME signed email messages
without having to know the signer's certificate.
The signer's certificate(s) is extracted from
the signed message, and returned on successful
verification.
A unified diff of the cleartext content against
the one resulting from verification is returned
as exception value if the content has been tampered
with.
Use at your own risk, send comments and fixes.
May 30, 2004
Ludovico Magnocavallo <ludo@asiatica.org>
"""
import os, base64
from M2Crypto import BIO, SMIME, m2, X509
from difflib import unified_diff
class VerifierError(Exception): pass
class Verifier(object):
    """
    accepts an email payload and verifies it with SMIME
    """
    def __init__(self, certstore):
        """
        certstore - path to the file used to store
                    CA certificates
                    eg /etc/apache/ssl.crt/ca-bundle.crt
        >>> v = Verifier('/etc/dummy.crt')
        >>> v.verify('pippo')
        Traceback (most recent call last):
          File "/usr/lib/python2.3/doctest.py", line 442, in _run_examples_inner
            compileflags, 1) in globs
          File "<string>", line 1, in ?
          File "verifier.py", line 46, in verify
            self._setup()
          File "verifier.py", line 36, in _setup
            raise VerifierError, "cannot access %s" % self._certstore
        VerifierError: cannot access /etc/dummy.crt
        >>>
        """
        self._certstore = certstore
        self._smime = None
    def _setup(self):
        """
        sets up the SMIME.SMIME instance
        and loads the CA certificates store
        """
        smime = SMIME.SMIME()
        st = X509.X509_Store()
        if not os.access(self._certstore, os.R_OK):
            raise VerifierError, "cannot access %s" % self._certstore
        st.load_info(self._certstore)
        smime.set_x509_store(st)
        self._smime = smime
    def verify(self, text):
        """
        verifies a signed SMIME email
        returns a list of certificates used to sign
        the SMIME message on success
        text - string containing the SMIME signed message
        >>> v = Verifier('/etc/apache/ssl.crt/ca-bundle.crt')
        >>> v.verify('pippo')
        Traceback (most recent call last):
          File "<stdin>", line 1, in ?
          File "signer.py", line 23, in __init__
            raise VerifierError, e
        VerifierError: cannot extract payloads from message
        >>>
        >>> certs = v.verify(test_email)
        >>> isinstance(certs, list) and len(certs) > 0
        True
        >>>
        """
        if self._smime is None:
            self._setup()
        buf = BIO.MemoryBuffer(text)
        try:
            p7, data_bio = SMIME.smime_load_pkcs7_bio(buf)
        except SystemError:
            # uncaught exception in M2Crypto
            raise VerifierError, "cannot extract payloads from message"
        if data_bio is not None:
            data = data_bio.read()
            data_bio = BIO.MemoryBuffer(data)
        sk3 = p7.get0_signers(X509.X509_Stack())
        if len(sk3) == 0:
            raise VerifierError, "no certificates found in message"
        signer_certs = []
        for cert in sk3:
            signer_certs.append(
                "-----BEGIN CERTIFICATE-----\n%s-----END CERTIFICATE-----" \
                    % base64.encodestring(sk3[0].as_der()))
        self._smime.set_x509_stack(sk3)
        try:
            if data_bio is not None:
                v = self._smime.verify(p7, data_bio)
            else:
                v = self._smime.verify(p7)
        except SMIME.SMIME_Error, e:
            raise VerifierError, "message verification failed: %s" % e
        if data_bio is not None and data != v:
            raise VerifierError, \
                "message verification failed: payload vs SMIME.verify output diff\n%s" % \
                    '\n'.join(list(unified_diff(data.split('\n'), v.split('\n'), n = 1)))
        return signer_certs
test_email = """put your test SMIME signed email here"""
def _test():
    import doctest
    return doctest.testmod()
if __name__ == "__main__":
    _test()
ludo ~ May 31, 2004 13:04:00 ~ upd Jun 02, 2004 02:51:46 ~ category Python

I haven't posted anything in a long while since I've been pretty busy working on a complex project in my spare time (let's call it project X), and I've not had Internet access outside office hours for the past couple of months. I hope to start writing again frequently soon, since I'm learning interesting things for project X which involves SMIME and SMTP mail (using OpenSSL, M2Crypto and the email package).

This brief note, done mainly so that I can use it as a reference in the future, regards customizing logging.LogRecord. One of the requirements of project X is that all operations for a single transaction are logged with a unique timestamp, representing the start of the transaction. Since I'm lazy, and I don't trust myself too much when writing code (especially at late hours), I did not want to have to carry around this value everywhere, and remember to pass it to every logging call.

So last night I had a look into the logging internals, to see how to extend logging.LogRecord to carry an additional argument representing my unique timestamp. Apart from being very useful, the logging package is very well architected, so it turns out subclassing LogRecord is not too difficult. Let's see one way of doing it, which is not necessarily the best one so please send me any improvements/suggestions.

LogRecord instances are

created every time something is logged. They contain all the information pertinent to the event being logged. [...]

LogRecord has no methods; it's just a repository for information about the logging event. The only reason it's a class rather than a dictionary is to facilitate extension.

LogRecord objects are created by makeRecord, a factory method of the logging.Logger class. To use a custom subclass of LogRecord, you have to subclass logging.Logger and override makeRecord, then set your subclassed Logger class as logging's default which is not too hard.

When you call logging.getLogger(name) to get a new logger instance, the getLogger function calls Logger.manager.getLogger(name). Manager.getLogger (Logger.manager is Manager(Logger.root)) does a bunch of stuff, then if no loggers with the same name have been already created, it returns a new instance of _loggerClass for the given name. logging._loggerClass is set by default as _loggerClass = Logger. So to set your Logger subclass as logging's default, just call:

logging.setLoggerClass(CustomLogger)

Now that we know what to subclass and how to set logging to use it, we still have to decide how to pass the extra argument to LogRecord. To pass it to our CustomLogger class as an additional __init__ argument we would have to subclass logging.Manager and override getLogger, which seems a bit too much work. Another option would be to directly set the extra parameter in every newly obtained CustomLogger instance, but then you would have to remember to set it every time, since logging raises a KeyError exception if you'd try to use the missing parameter as a string format (you could also set it to a default value in CustomLogging.__init__, but then you'd get a default value in your logs which IMHO is even worse). The way I choose to do it is to set the parameter in logging.root so that my extra argument is available to all loggers since they share a reference to the root logger. To make the root logger emit CustomLogRecord instances, I redefined logging.RootLogger.MakeRecord to match CustomLogger.MakeRecord.

This solution obviously won't work if you need to set different extra arguments for different loggers, and my custom classes manage only a single extra argument, which is exactly what I need for this project but may not be enough for other cases.

#!/usr/bin/python
import logging
from time import strftime, gmtime
class CustomLogRecord(logging.LogRecord):
    """
    LogRecord subclass that stores a unique -- transaction -- time
    as an additional attribute
    """
    def __init__(self, name, level, pathname, lineno, msg, args, exc_info, trans_time):
        logging.LogRecord.__init__(self, name, level, pathname, lineno, msg, args, exc_info)
        self.trans_time = trans_time
def makeRecord(self, name, level, fn, lno, msg, args, exc_info):
    return CustomLogRecord(name, level, fn, lno, msg, args, exc_info, self.root.trans_time)
class CustomLogger(logging.Logger):
    """
    Logger subclass that uses CustomLogRecord as its LogRecord class
    """
    def __init__(self, name, level=logging.NOTSET):
        logging.Logger.__init__(self, name, level)
    makeRecord = makeRecord
# setup
logging.setLoggerClass(CustomLogger)
logging.root.trans_time = strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
logging.RootLogger.makeRecord = makeRecord
if __name__ == '__main__':
    # get logger
    logger = logging.getLogger('modified logger')
    logger.propagate = False
    hdlr = logging.StreamHandler()
    hdlr.setFormatter(logging.Formatter(
        '%(trans_time)s [%(asctime)s] %(levelname)s %(process)d %(message)s'))
    logger.addHandler(hdlr)
    logger.setLevel(logging.DEBUG)
    logging.root.addHandler(hdlr)
    logger.debug('test')
    from time import sleep
    sleep(1)
    logger.debug('a second test, with (hopefully) the same date as the previous log record')
    logging.warn('root logging')

The output of the above script should be something like:

Tue, 18 May 2004 12:53:04 +0000 [2004-05-18 14:53:04,898] DEBUG 31268 test
Tue, 18 May 2004 12:53:04 +0000 [2004-05-18 14:53:05,898] DEBUG 31268 a second test,
with (hopefully) the same date as the previous log record
Tue, 18 May 2004 12:53:04 +0000 [2004-05-18 14:53:05,898] WARNING 31268 root logging
ludo ~ May 18, 2004 08:05:00 ~ upd May 18, 2004 15:03:01 ~ category Python

I don't usually post entries on news items I read on other sites, as I don't like repeaters blogs too much. I will make an exception tonight, since the article I'm writing about summarizes well a few important things I like about Python.

The article is titled Rapid Development Using Python and appeared today on Linux Journal. I learned about it from a Google News Alert agent, which is usually not very interesting (so this entry is a double exception).

A few key points from the article, which is a good read if you like Python, and of course an even better one if you have never used it.

The interactive interpreter

We anticipated making changes frequently on a remote device, with a customer representative viewing the interface and providing instant feedback.
Python allowed us to achieve this environment primarily because it is easy to use interactively. Prototyping through an interactive interpreter is an effective mechanism for exploring different approaches to solving a problem.

Introspection

We wanted to dispatch Web requests to code in a very direct fashion. An architecture based on a central controller that used introspection to determine where to route requests seemed a clean and simple choice.
[...]
an inspectable run-time is nearly equivalent to running an application under a debugger while making changes.

String Manipulation and File I/O

Iterating over the content of a file takes two lines of Python. In comparison, Java requires five or five instantiations followed by two or three lines to read. Once read into memory, content must be tokenized before iteration. While iterating, casting is required.

Development experience

Because it supports import-on-demand and module reloading through introspection, Python allowed us to change logic within external modules and have the changes be accessible immediately by the central controller.
Instead of wasting time fighting with the language, we spent time fulfilling customer requirements. Our rapid development environment blazed to life fueled by Python.
We found that looking at problems from a Pythonic standpoint often led to simple and elegant solutions that addressed both functionality and portability.

I completely agree with all of the points above. My experience with Python, though short (I came to Python pretty late) and not very broad (I haven't been able to use it at work as much as I would have liked to) is very similar to the one outlined in this article.

ludo ~ Dec 02, 2003 23:12:00 ~ upd Dec 02, 2003 23:35:21 ~ category Python

The Python Library is a continue source of amazement: I just discovered the very useful unicodedata module, which pairs the u'N{LETTER NAME}' escape sequence.

The N{} escape sequence works like this:

>>> u'\N{LATIN SMALL LETTER M WITH DOT BELOW}'
u'\u1e43'

The unicodedata module, among other things, allows you to lookup the unicode character associated with a name, which allows you to build mapping tables using character names:

>>> import unicodedata
>>> unicodedata.lookup('LATIN SMALL LETTER M WITH DOT BELOW')
u'\u1e43'

The reverse of lookup() is name():

>>> unicodedata.name(unicodedata.lookup('LATIN SMALL LETTER M WITH DOT BELOW'))
'LATIN SMALL LETTER M WITH DOT BELOW'
>>>

If you want to check unicode names, a very useful site is the Letter Database at the Institute of the Estonian Language. An example is the search for LATIN SMALL LETTER S WITH DOT BELOW, which yields this page.

ludo ~ Oct 26, 2003 15:34:00 ~ upd Oct 26, 2003 15:48:32 ~ category Python