Combining Docbook-generated chunked HTML

ludo, September 19, 2003 at 16:27:00 CEST
cover

The book Creating Applications With Mozilla is freely available at mozdev, but unfortunately it only comes as a set of HTML pages (or at least that's what I was able to find).

Having some time to waste, I set out to combine all the HTML pages in one single file, trying to improve my understanding of the wonderful elementree and elementtidy packages along the way.

The resulting script parses the files in the (hopefully) correct order, combines their HTML body elements into a single file, and fixes the internal references to point to the correct places in the new file.

The script takes about 19 seconds to run on my crappy celeron 600 machine, and the resulting file is 1.4Mb. Given that the book seems to written in Docbook, and produced with the chunked HTML Docbook XSL stylesheet, this script may serve as a starting point to reverse-engineer Docbook-produced HTML, if you ever need to do it.

Related posts


Comments closed.

Reader comments

Comments closed.