The Unicodedata Module

ludo, October 26, 2003 at 15:34:00 CET

The Python Library is a continue source of amazement: I just discovered the very useful unicodedata module, which pairs the u'N{LETTER NAME}' escape sequence.

The N{} escape sequence works like this:

>>> u'N{LATIN SMALL LETTER M WITH DOT BELOW}'
u'u1e43'

The unicodedata module, among other things, allows you to lookup the unicode character associated with a name, which allows you to build mapping tables using character names:

>>> import unicodedata
>>> unicodedata.lookup('LATIN SMALL LETTER M WITH DOT BELOW')
u'u1e43'

The reverse of lookup() is name():

>>> unicodedata.name(unicodedata.lookup('LATIN SMALL LETTER M WITH DOT BELOW'))
'LATIN SMALL LETTER M WITH DOT BELOW'
>>>

If you want to check unicode names, a very useful site is the Letter Database at the Institute of the Estonian Language. An example is the search for LATIN SMALL LETTER S WITH DOT BELOW, which yields this page.

Related posts


Comments closed.

Reader comments

Comments closed.