This section gives some basic information about document types, and appropriate resolutions for scanning and display. Readers familiar with these concept may skip the section
The punch line: bitonal, grayscale, or color documents should be scanned at 300DPI or higher. If the scanning software does not support DjVu natively, individual images should be saved preferably in an uncompressed file format such as TIFF, BMP or PBM/PGM/PPM.
The three basic "modes" of DjVu (DjVuText, DjVuPhoto, and DjVuLayered) correspond to three different types of images:
The resolution of computer screens generally varies between 70 and 100 pixels per inch depending on the screen size and the display settings. Therefore the notion of "dots per inch" (DPI) for digital images is somewhat ambiguous. In the following, we use DPI to mean "pixels per inch of paper". For example, a one inch by one inch document scanned at 300DPI will produce a 300 by 300 pixel image. This 300x300 pixel image displayed on a computer screen at full resolution may range in size between 3x3 inch and 5x5 inches, depending on the size of the monitor and the display settings. Scaled down by a factor of three to 100x100 pixels, the image will occupy approximately between 1x1 and 1.7x1.7 inches on the screen (which is close to the original size), but many details will be invisible since blocks of 3x3 pixels in the orignal scanned image will be crammed into a single screen pixel. Until display technology improves dramatically, our only option is to use viewer software that allows us to zoom in when we need to look at details and zoom out when we want to display the image at a scale that best approximates the physical size of the original (and that fits on the screen).
Bitonal DocumentsMost bitonal documents must be scanned at least at 200 DPI to be readable (that's what most Fax machines do), but should be scanned at 300 DPI to ensure good quality on screen and on paper, and to ensure that small characters will be easily readable. 400 DPI will give an extra safety margin, and 600 DPI will ensure that a printout of the scanned document will be virtually indistinguishable from the original. Professional publishing applications may require 1200 or 2400DPI.
Since typical computer screens have a resolution of about 75 to 100 DPI, one might think that scanning a document at 100 DPI would suffice for screen-based applications. That is not the case: bitonal documents at that resolution have jaggy characters that are almost unreadable. Two solutions are possible: (1) scanning at 100 DPI in grayscale (not bitonal) to improve the image quality, or (2) scanning in bitonal at 300 DPI or more and relying on the viewer software's zooming capability to scale down the images and display them apropriately on a computer screen. The second solution is much preferable because it produces higher quality images, smaller files, and good quality printouts.
A typical 300 DPI page scanned in bitonal and compressed with DjVu will occupy 5 to 30KB. Compressing multiple pages at once generally produces smaller files because the DjVu compressors (such as the command-line tools bitonaltodjvu or documenttodjvu) are able to extract information that is common to multiple pages and store it once, rather than store a copy in every page. Large engineering drawings may produce files on the order of 50KB to 200KB. The maximum pixel size of a DjVu image is 32,000 by 32,000 pixels.
Photos and PicturesPhotos and pictures are often referred to as continuous-tone images because they contain smooth gradations of color and grayscales, unlike the sharp transition between white and black found in typical printed documents.
A 600x400 pixel photo fills up about one quarter of the screen of a modern PC, and produces a hardcopy of barely acceptable quality when printed at 100 DPI on a 6x4 inch sheet. While this may be sufficient for casual family pictures and internet distribution, it is insufficient for showing fine details of an object, or for giving an accurate rendering of a painting. Such images typically occupy 20 to 40KB when compressed into a DjVuPhoto files.
1600x1200 pixel photos, such as the ones produced by high-end consumer digital cameras (2.1 megapixels) will fill up the entire screen of high-end PCs (and overflow the screen of most), but they can be printed at 300 DPI on a 6x4 sheet or enlarged to 12x8 at 150 DPI with decent quality. Many details are present on such pictures, but not as much as on an enlarged traditional photo. Such images typically occupy 100-300KB when compressed into a good quality DjVuPhoto file. Compressing down to 50KB will produce files of acceptable quality but minor artifact may be present.
Higher resolution digital images are often scanned from negatives or slides using a slide scanner. Resolutions of 3200x2400 pixels (300 DPI when printed on a 12x8 sheet) can be achieved. Many details are visible at that resolution. DjVu's zooming/panning and on-the-fly decompression techniques are most appropriate for these images.
Still higher resolutions can be obtained with high-end digital studio cameras (up to 12000x10000), with professional scanners, or with geospatial imaging systems, but the high price of these imaging devices limits their widespread use. While such images can be compressed with DjVuPhoto, they may be better handled with LizardTech's MrSID format, which has no size limitations.
Color DocumentsColor documents such as magazines, catalogs, or ancient documents generally contain text and pictures or textured backgrounds. DjVu works best when this type of document is scanned at 300DPI and higher. DjVu will extract the text and line drawings and compress them at the document resolution.
While the foreground/background separation process can function at a wide range of resolutions, it works best at 300DPI and higher. Low resolution documents may be upsampled prior to compression, but the result will not be as good as with 300DPI originals.
An 8.5x11 inches color page (magazine, catalog, manuscript) scanned at 300DPI on a color scanner at 24 bits per pixels (sometimes called "millions of colors" in scanning software), will occupy about 25MB uncompressed (in TIFF, BMP, or PPM formats). Compressing it with JPEG will produce a file between 300KB and 5MB depending on the quality setting. Compressing to DjVu with the DjVuLayered mode will produce files between 30KB and 100KB, of which 5 to 40KB will be used for the foreground layer, and the rest for the backgrounds and pictures.
Such high compression ratios come with a price: the foreground/background separation process may sometimes cause visual artifacts on certain types of document. There are applications for which perfect image quality is paramount, and disk space or download times are less important. Examples of such applications include archives of historical manuscripts that are not accessed very often, and document repositories for the publishing industry. In such situations, it is preferable to compress color documents in high-quality DjVuPhoto mode or with MrSID, rather than in DjVuLayered mode. The files will be significantly larger, but the quality setting can be tuned up as high as desired to quasi-lossless levels.
Digital DocumentsDigital documents are document files that are produced electronically, (e.g. with a word processor), rather than by scanning paper. The DjVu compressor for UNIX can convert document from PostScript and PDF to DjVu (by calling a free software tool named "ghostscipt"). However, as of April 2000, no tool exist to convert directly from formats such as Microsoft Word or PowerPoint to DjVu. Such tools will be made available in the near future by LizardTech and its partners. In the meantime, it is possible to export (or "print" virtually) such documents to a PS or PDF file and convert the result to DjVu.
In this section, we give an overview of the software packages produced and distributed by LizardTech (and AT&T). We do not describe the tools produced by DjVu partners and third party vendors.
The following compression packages are available from LizardTech as of October 2000.
a single bitonal page
scanned magazine page in color
screen dumps, palettized images
What if my image is in a format that DjVu does not support
While DjVu is often used for individual pictures and single page documents, many content providers use it to distribute multi-page documents.
DjVu 3.0 supports three different models to store DjVu content, plus two obsolete models supported by the 2.0 plug-ins and viewers:
BUNDLED vs INDIRECTAs indicated above, there are two types of multi-page DjVu documents: BUNDLED and INDIRECT. In the BUNDLED format the whole document is packed in a single file. In the INDIRECT format, each page is in a separate file (generally all residing in a single directory). INDIRECT documents are accessed through a document file (or index file) that contains pointers to each individual file composing the document. The advantage of the BUNDLED format is that it is easy to manipulate (copy, rename, mail...). It is the best option for DjVu documents that are accessed on a local hard drive, or through a fast network connection. The disadvantage for web-based applications is that the pages of a BUNDLED document are downloaded sequentially, there a page cannot be viewed until all the previous pages have been downloaded. The INDIRECT format solves this problem. Because the pages of a document in the INDIRECT format are stored in separate files, they can be accessed on demand in any order, without requiring a so-called "byte server".
From the user's point of view, there is no real difference between the INDIRECT and BUNDLED models (except for the faster random page access of the INDIRECT mode). The plug-in allows to users to print, save, and search the whole document in a single operation with both models. The "save document" feature of the plug-in also allows to convert from one format to the other.
For Internet-based applications, we recommend the INDIRECT format. For Intranet or Local Area Network based applications, either the BUNDLED or INDIRECT formats can be used.
Remember that the BUNDLED and INDIRECT formats are supported by the plug-in and viewers 3.0 and above. Tell your users to upgrade. (Using the image/x.djvu Mime type for your content will tell 1.0 and 2.0 plug-in users to upgrade).
Assembling DjVu Images into Multipage DocumentsThe batch compressors distributed as part of the DjVuText and DjVuLayered packages can directly produce multipage DjVu file when fed with multiple input files. The files produced are smaller than if the pages are compressed separately because the compressor can extract and share redundant information accross multiple pages.
Individually compressed DjVu pages can be assembled into
multipage documents using the free package DjVuMulti.
To assemble a bunch of DjVu images into a single BUNDLED document
To assemble a bunch of DjVu images into an INDIRECT document, type:
where documentdir must be an existing directory where all the individual page files will be copied.
To disassemble a BUNDLED document into an INDIRECT one, simply say:
To convert a multipage document from one of the old 2.0 multipage formats, do
The programs djvujoin, and djvubundle supersede the 2.0 programs djvuindex and djvumerge.
bitonal multipage TIFF-G4 documentOne or several multipage TIFF-G4 files can be converted into a single DjVu document with:
This will create a BUNDLED DjVu document.
multiple scanned pages in color
PostScript documentsThe UNIX versions of bitonaltodjvu and documenttodjvu can accept PostScript or PDF file as input and convert them to DjVu. They call the free software tool "gs" (or GhostScript) to transform the document into page images, and then simply compress those images. Here is an example:
Another possibility is to convert the PostScript file to image files separately using "gs", and then call one of the regular DjVu compressor documenttodjvu or the free tool cpaldjvu distributed with libdjvu++ 3.0. Here is a simple tcsh script that turns a PostScript or PDF page into DjVu: