Viewing CHM files, and converting CHM to HTML or PDF files in (Ubuntu) Linux
As most people know, CHM (Microsoft Compiled HTML Help) is a proprietary format, and not supported by default in Linux. Thankfully, there are several options available for viewing CHM files, and even converting them to another format – such as HTML and PDF.
Viewing CHM Files
The simplest method for dealing with CHM files is to download and install a CHM Viewer, for example gnochm or xchm, which under Ubuntu/Debian can be installed via a Terminal:
[root@akwal]# sudo apt-get install gnochm
or
[root@akwal]# sudo apt-get install xchm
or via the Synaptic Package Manager, by searching for “gnochm”, or “xchm”. Once installed, CHM files will automatically open within the CHM Viewer.
Converting CHM Files
My prefered method for dealing with CHM files, is to convert them to a more universal format, such as HTML, or even PDF, and there are a couple of ways, and several different tools for accomplishing this.
Conversion Method 1: CHM -> HTML (-> PDF)
Firstly, it is possible to simply decompile the CHM file into component HTML files, which can be opened in any web browser. These HTML files may then optionally be transformed into a PDF document. In order to do this, two main packages (with dependencies) need to be installed – chmlib, and htmldoc:
[root@akwal]# sudo apt-get install libchm-bin htmldoc
The first part of the process calls upon chmlib to essentially decompile the CHM file, and save the new files to a specified directory, for example:
[root@akwal]# extract_chmLib my_chm_book.chm htmloutputdir
This will pull apart the CHM file, and store all the new HTML files within the “htmloutputdir” directory (within a sub-directory called “final”). If desired, htmldoc can be called upon to convert the HTML files into a single PDF document. Running
# htmldoc &
from the Terminal, opens up the htmldoc GUI from which the HTML files can be selected for input, the output formatted, and PDF document generated. The htmldoc website has extensive documentation which covers this process, but for converting to PDF, I find the next method much easier!
Conversion Method 2: CHM -> PDF
A tool written in Python, called chm2pdf, essentially cuts out (for the user) the intermediary processes above. It sits as a layer on top of chmlib and htmldoc, automating most of the conversion work, and as such also requires chmlib, htmldoc and additionally pychm (Python binding for chmlib) in order to execute. chm2pdf can be downloaded from the website and compiled/installed manually or, in Ubuntu, installed from the repositories. To install it, and the required additional applications, in a terminal run:
[root@akwal]# sudo apt-get install libchm-bin htmldoc python-chm chm2pdf
chm2pdf is a command line tool, and in most cases, the default options work for me (outputting an A4 PDF document, with ToC, images, links etc.), and is as simple as running:
[root@akwal]# chm2pdf --webpage my_chm_book.chm
from the directory where the CHM file is located. This converts the file (to HTML – these files are stored in the /tmp directory), and generates a PDF document with the same name, with the ToC, Index, links, images etc. all intact. Running either:
[root@akwal]# man chm2pdf
or
[root@akwal]# chm2pdf --help | less
will output the manual and help pages, both of which contain a wealth of information and command options to perfect and tweak the CHM to PDF conversion process.
0 comments:
Post a Comment