This tutorial is written with DjVu v3.x in mind. Some of the features described in here may not be available with earlier versions of the DjVu viewers and compressors. Because many applications of DjVu are high-volume, produced with the "batch" compressor on server machines, much of the discussion on creating DjVu documents concerns the command-line programs provided as part of the DjVu compression tools, as well as the tools distributed with the open source DjVu Reference Library (libdjvu++ 2.0 and 3.0). The GUI-based tools for the desktop available LizardTech (DjVu Solo) or from third parties are only discussed briefly.
Many products and technologies are available today for storing and distributing digital images. However, most of these technologies are slow and inconvenient for distributing high-resolution color photos, and woefully impractical for delivering high-resolution scanned documents in B&W and color, because the files they produce are prohibitively large, and the viewing software memory hungry.
With its ability to attain very high compression ratios while preserving crisp and legible text, DjVu has no rival when it comes to distributing scanned documents, particularly color documents. For this kind of applications, there is quite simply no practical alternative to DjVu.
DjVu is also an excellent format for distributing high-resolution photos, because its wavelet-based continuous tone image compression technology produces small files with very fast progressive display, seamless zooming and panning, and requires minimal memory in the client.
To handle single-page or multi-page scanned document in color or black and white, and high-resolution photos and pictures, DjVu wraps three compression formats into one. In this tutorial, these three formats will be referred to as DjVuText, DjVuPhoto, and DjVuLayered.
A DjVu document may contain one or several pages. Plug-ins and viewers provide basic mechanisms for navigating between the pages of a multipage document. Multipage documents come in two flavors: BUNDLED, where the whole document is packed in a single file, and INDIRECT, where individual pages are stored in separate files in a directory. The relative advantages and disadvantages of these two formats are explained in later section of this tutorial.
IMPORTANT NOTE: multipage documents produced by version 2.x and version 3.x of the DjVu software are not compatible. The 3.x plug-ins will properly display 2.x multipage documents. Users of pre-3.0 version of the plug-in must upgrade to view 3.x multi-page document. It is also worth noting that, DjVuShop 2.0 can only read 2.0 style multipage documents.
Each DjVu image or page may contain additional chunks of data that include such information as the resolution of the image, hyperlink definitions, highlighted area definitions, default zoom factors, frame color, and display mode. A DjVu image can also include a compressed text chunk which contains the text on the page in a computer readable format (produced by an OCR software for example). When present, this text chunk is used by viewers and plug-ins to search and highlight words and phrases in DjVu documents.
The short answer to the question "when to use DjVu" is: Use DjVu for:
A short series of tables that compare the performance of DjVu with other formats is available here. The following paragraphs discuss the relative merits of DjVu versus other popular formats.
DjVu vs TIFF-G4 and PDFthe punch line:: Scanned black and white documents in DjVu (using DjVuText) are between 3 and 10 times smaller than in TIFF-G4. The most widely used format for archiving scanned bitonal (black and white) documents is called TIFF-G4. TIFF-G4 uses the "CCITT Group IV" lossless compression algorithm (the same algorithm used in FAX machines), and encapsulates it in a TIFF file format.
TIFF-G4 is not supported natively by web browsers, so a plug-in must be used to view documents in that format. Because TIFF-G4 was primarily designed for storage, and not for web distributions, it does not support web-friendly features, such as hyperlinks (at least not in a vendor-independent way). The G4 compression technology is now quite old and outdated.
When scanned documents are converted to PDF, the pages are merely compressed using CCITT GroupIV and encapsulated into a PDF file structure (instead of TIFF) without being recompressed. Therefore PDF file sizes for bitonal scanned documents are at least as large as TIFF-G4.
Bitonal documents can be compressed to DjVu in lossy mode or lossless mode. In lossy mode, DjVu documents are between 3 and 10 times smaller than with G4. Cleanly scanned multipage documents mostly containing text will compress close to 10 times better than with TIFF-G4, because DjVu can take advantage of repeating character shapes in the documents. For other documents (single page documents, noisy documents, low resolution documents, documents with lots of drawing or halftone pictures) the advantage over G4 will typically be between 3 and 5 times, and occasionally as low as 2 times. DjVu also has a lossless bitonal mode which produces files typically about half the size as TIFF-G4. However, the "lossy" mode is visually lossless, and should be used for most applications.
DjVu vs JPEGAs stated above, DjVuPhoto produces files that are about 1/2 the size of JPEG for a similar quality. A live comparison of JPEG and DjVu is available here.
For lack of a better standard until now, scanned color documents are sometimes compressed with JPEG. Such images at 300 dpi are typically 300KB to 2MB (and often more), and take a lot of memory and a very long time to decode and display an a standard PC configuration. For these reasons, JPEG is rarely used for images on the web larger than about a million pixels (or resolutions larger than about 100 dpi).
By separating the layers, DjVu can keep the text at 300 dpi, while downsampling the backgrounds at 100 dpi. File sizes are typically 30 to 200KB or 5 to 10 times smaller than JPEG.
DjVu vs PDF and PostScriptAs we said above, for scanned documents, particularly color documents, there is simply no practical alternative to DjVu. While PDF can be used for scanned documents the file sizes are large for bitonal documents, and totally impractical for color documents.
The case of digitally produced documents is slightly different. DjVu can replace Adobe's PDF and PostScript advantageously in certain cases. The advantages of DjVu over PDF are:
The main advantage of using DjVu over PostScript (or gzipped PostScript) for distributing digital documents are:
DjVu versus GIFFor small images and logos with a limited number of colors, the free DjVu compressor "cpaldjvu" distributed with the libdjvu++ 3.0 open source library will produce files that are about half the size as with GIF. Although this mode can be used advantageously to embed small images into web pages, the absolute size reduction (in terms of KB) will be small. So this may not be worth the trouble unless the original GIF is over 30KB.
cpaldjvu can also be used to convert digitally produced document to DjVu (more on this in the compression section).