September 25, 2003

Radios, Quality, the Internet

published in:

I have always been an avid listener of Internet radios since my favourite music is very specialized (early or "soul" reggae, dub, and some jazz+r&b), and very unlikely to be broadcasted over the air, especially in Italy where I live.

Last night I had a further example of the power of Internet radios. Perugia, which is one of the Italian soccer teams I try to follow, was playing an away match vs Dundee FC. I was at my home office, which has no tv, so I looked up the Dundee FC site on Google, followed the link and connected to their local radio station live broadcast of the match (which I only half-managed to follow due to my unfamiliarty with the Scottish accent).

What struck me is that the above procedure involved almost no conscious thinking on my part. My interest about the match match was followed a few seconds later by a live commentary streaming out of my speakers.

So, coincidences being the stuff life is mostly made of, this morning I was not surprised to find in my aggregator a new entry on Tim Bray's blog on Radio, pointing to Doc Searls' The Continuing Death of Radio as Usual.

Doc Searls makes a few interesting points, lamenting the low quality of radio receivers (AM in cars, FM at home), the slow death of over-the-air broadcasting, and IP broadcasting as the future of the Radio.

Not that it matters to anyone, but I agree with everything he writes, except his statement that There's almost no way to get a good AM radio anymore, even if you want one.

If you don't need to integrate a radio receiver with fancy stuff like a home theatre, or distribute its audio signal throughout a house, there are plenty of excellent radios out there. They are also pretty cheap, have better audio quality than most expensive stereo equipment available on the market, are superbly good looking, and will keep their value well. The picture above should give you a clear idea of what I'm saying.

It shows a Grundig 60s console stereo set, including AM and FM radio, an equalizer, a superb antenna, four speakers, and a turntable. All integrated into a handmade wood piece of furniture. We bought a similar one from the late 50s a few months ago for less than 300 euro, and its tuning and sound qualities are excellent. Of course, sound quality is no surprise since radios from this period use tube amplifiers.

September 24, 2003

HTTP proxies

published in:

Alan Kennedy has just announced a list of Python-based HTTP proxies, a few of them may come handy when developing web applications.

And if you're wondering what I'm doing sitting in front of my computer at this hour, well this is one of those nights where I just can't get asleep.

September 19, 2003

Combining Docbook-generated chunked HTML

published in:
cover

The book Creating Applications With Mozilla is freely available at mozdev, but unfortunately it only comes as a set of HTML pages (or at least that's what I was able to find).

Having some time to waste, I set out to combine all the HTML pages in one single file, trying to improve my understanding of the wonderful elementree and elementtidy packages along the way.

The resulting script parses the files in the (hopefully) correct order, combines their HTML body elements into a single file, and fixes the internal references to point to the correct places in the new file.

The script takes about 19 seconds to run on my crappy celeron 600 machine, and the resulting file is 1.4Mb. Given that the book seems to written in Docbook, and produced with the chunked HTML Docbook XSL stylesheet, this script may serve as a starting point to reverse-engineer Docbook-produced HTML, if you ever need to do it.

September 18, 2003

LaTeX for sanskritists

published in:

A few useful links on skt-utf, an Omega TeX package that allows you to input and process Sanskrit inside LaTeX.

The skt-utf package resides at the site of the Martin-Luther-Universität Halle- Wittenberg Philosophische Fakultät Fachbereich Kunst-, Orient- und Altertumswissenschaften Institut für Indologie und Südasienwissenschaften, on a set of pages maintained by J Hanneder.

A more straightforward set of instructions for installing and using the utf-skt package can be found on the page Sanskrit Unicode Text Processing by Christian Coseru of the Australian National University. This page also covers the installation and usage of the corresponding Emacs packages.

My installation was pretty simple (I run Slackware 9.0 with Dropline Gnome)

get root privileges, then extract the package

ludo:~# tar zxvf utf-skt.tgz

find the location of your texmf-local tree

ludo:~# kpsewhich -expand-var $TEXMFLOCAL
/usr/share/texmf-local

if the directory does not exist yet, create it

ludo:~# mkdir /usr/share/texmf-local

copy the contents of the extracted archive's texmf directory under your texmf-local tree

ludo:~# cp -R utf-skt/texmf/* /usr/share/texmf-local/

update the TeX files index

ludo:~# /usr/share/texmf/bin/texhash

The installation is now complete, let's revert back to a normal system user to test the package, using the example documents contained in the utf-skt/user/examples directory extracted from the archive.

Before starting, it will be useful to note that utf-skt is an Omega package (Omega is a TeX extension "designed for the typesetting of all the world's languages"), and so the commands used to generate .dvi files from the .tex sources are different from standard LaTeX. Instead of latex we will need to use lambda, instead of dvips odvips.

Let's create a PDF file from the Transliterated.tex example file (I only list the commands, without their output):

ludo:~/download/skt/utf-skt/user/examples$ lambda Transliterated.tex
ludo:~/download/skt/utf-skt/user/examples$ lambda Transliterated.tex
ludo:~/download/skt/utf-skt/user/examples$ odvips Transliterated.dvi
ludo:~/download/skt/utf-skt/user/examples$ ps2pdf Transliterated.ps

Supposing you got no errors (due to problems with your TeX or utf-skt installations), you should now have a Transliterated.pdf file. If you open the file in Acrobat Reader though, you will notice that the fonts are all blurred. The PDF will print ok, but reading it on-screen is a pain. This is due to dvips including bitmap versions of the fonts used in the document, instead of their outlines.

After a bit of digging around in Google, and experimenting with different solutions to this problem, I found a thread in the debian-users list that worked (keep in mind I'm a novice TeX user, so other solutions may be better for different setups, etc.):

In your home directory, make (or modify it if it already exists) a file 
named .dvipsrc which must contain the lines: 
    p+ psfonts.cmz
    p+ psfonts.amz
 Make sure that the LaTeX preamble of your LyX file (or the part before 
begin{document} if you are using straight LaTeX files) contains: 
    usepackage[T1]{fontenc}
    usepackage{ae,aecompl}

Re-running odvips and ps2pdf the resulting PDF looks as in the image above. The document source in the image is opened in Gnome 2.4's gedit.

September 17, 2003

HTML Dynamic TestRunner for PHPUnit

published in:

Yesterday I set up to write unit tests for a few PHP classes I'm refactoring. Given my usual tendency to set aside serious tasks and concentrate on eye candy, after a few tests I decided to write my own test runner (something I've done at least twice before, never storing away the code for later use).

My test runner builds on a few assumptions:

  • the tests live in a web accessible directory
  • every php file in the directory with a name starting with test is a test case
  • every test file contains one case class extending PHPUnit_TestCase
  • every test class has the same name as the file where it resides
  • test cases are all the methods of the class whose names start with test
  • you run tests and see results with a web browser

The main test runner features are:

  • conditional execution of one or more test cases based on an HTML form
  • run only/exclude a test with a single click of the mouse
  • display aggregate test results by suite, showing counts for successes/failures and a warning indicator if any have been raised
  • optionally expand test results to show results/failures for single test cases
  • remember each suite visibility status (collapsed/expanded) for subsequent runs
  • collect PHP warnings emitted during each test case run and display them under the appropriate test suite

I made a few screenshots, showing a collapsed view of all available suites, the same for a single suite, with a warning indicator, and one suite expanded, showing test cases results and warnings.

The DynamicTestRunner code only needs to include the PEAR version of PHPUnit. Using a templating engine would have made the code more understandable, but at the cost of introducing a dependency.

If you want to preserve the option of being able to run each test file on its own, you can test if it has been called by testRunner with a simple if statement at the end of each test file (substituing of course testEntities with the name of the class declared in the file):

if (!isset($_SERVER['SCRIPT_FILENAME']) ||
    $_SERVER['SCRIPT_FILENAME'] == __FILE__) {
    $suite = new PHPUnit_TestSuite('testEntities');
    $result = PHPUnit::run($suite);
    echo $result->toString();
}

September 16, 2003

Lintel in the Enterprise

published in:

Tonight, while trying to scan very old (100 yrs or so) negatives on my cheap ScanJet 3400c, I managed to read the very good article The Mad Hatter meets the MCSE, mentioned on a Slashdot post.

As I wrote in a past entry, I saw Sun Rays and MadHatter at Sun's Milan offices a few weeks ago, and I was deeply impressed. This article examines in detail the impact of introducing a Lintel architecture in the Enterprise.

What I liked most from tonight's quick read of this article (gonna read and summarize it for the big shots tomorrow at the office) are a few well-made points:

  • a Linux integration in the enteprise is a potential and very expensive failure, if it has to adapt to a Windows-based infrastructure
  • a Unix-based architecture is unlikely to evolve in Windows or Mainframe centric environments due to cultural differences between the two worlds
  • Sun's Unix Business Architecture (UBA) has a tremendous potential and could change the way enterprises design and manage their IT infrastructures

Definitely a recommended reading.

September 09, 2003

Search engines are stupid

published in:

Hmmm...maybe people do stupid searches on them. I was looking at the google searches people come on my site from, and noticed a good one: x x x stuff (without the spaces of course). I promptly opened it, and...wow I'm in 7th position, all thanks to a masked email address in my post on the J editor. Due to insistent requests from my gf, I changed the email address to to show z characters instead of x.

September 06, 2003

Removing unused css classes

published in:

One of the things I like best of Python is the interactive console. I often use it to do quick manipulations on text files, and every time I wonder how I did manage before learning Python, when I wrote Perl or PHP scripts for similar things (yes I know that Perl has useful command line options for stuff like that, but with the Python console you can poke around and see the data you're manipulating interactively, get help on commands, etc.).

So today I set to the task of removing unneeded CSS classes from a huge HTML file I did not produce myself.

Getting the data from a file is a one-liner:

/-(ludo@pippozzo)-(27/pts)-(15:44:06:Sat Sep 06)--
-($:~)-- python
Python 2.3 (#1, Aug  5 2003, 15:11:52)
[GCC 3.2.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> all = file('xxxxxxxxxx.html', 'r').read()

Then we import the re module and obtain a list of all css classes used in the file, removing duplicates:

>>> import re
>>> r = re.compile('class="([^"]+)"')
>>> styles = [s for s in r.findall(all) if s not in locals()['_[1]'].__self__]

The duplicate removal comprehension looks like a clever Perlish hack, but I'm in the console and it's nice to be able to do everything in one line. I copied it from the Python Cookbook entry The Secret Name of List Comprehensions by Chris Perkins, who warns that It should come as no surprise to anyone that this is a totally undocumented, seat-of- your-pants exploitation of an implementation detail of the Python interpreter. There is absolutely no guarantee that this will continue to work in any future Python release.

Now that we have the list of classes in use, we can remove unneeded ones and save the processed content on the original file:

>>> r = re.compile(r"""(?ms)^(S*.(S+)s+{[^}]+}n)""")
>>>
>>> for style in r.findall(all):
...     if not style[1] in styles:
...             all = all.replace(style[0], '')
...
>>> file('xxxxxxxxxx.html', 'w').write(all)

Storing the styles in a dictionary is more efficient (not that it matters in this quick console run), and eliminates the need of using the duplicate removal hack. Here is a version using a dictionary:

>>> import re
>>> all = file('xxxxxxxxxx.html', 'r').read()
>>> r = re.compile('class="([^"]+)"')
>>> styles = {}
>>> for s in r.findall(all):
...     styles[s] = None
...
>>> r = re.compile(r"""(?ms)^(S*.(S+)s+{[^}]+}n)""")
>>> for style in r.findall(all):
...     if not style[1] in styles:
...             all = all.replace(style[0], '')
...
>>> file('xxxxxxxxxx.html', 'w').write(all)

update: the lines above worked for the particular file I was editing, a general solution would probably need a few changes:

  • In the second regexp, used to identify style class declarations, I assume the class identifier is anchored at the beginning of a line, which is not always the case
  • I look for style class declarations in the whole file, which is pretty pointless (but saves typing a few lines of code in the console and works for the specific HTML file in question); style class declarations are obviously inside a <style></style> block, so it's better to limit the scope of the second findall() to the contents of the style block(s), speeding up the search/replace and reducing the chance of errors in the regexp match (for example, matching blocks of C/Java/PHP source in the body of the document)
  • maybe use a scanner object instead of the second findall, and replace using the match position, as Fredrik Lundh explains in Using Regular Expressions for Lexical Analysis