[DjVu Zone]
  DjVu Zone
  What's New
  What is DjVu
  Tutorial
  Documentation
  Digital Libraries
  DjVu on the Web
  DjVu Companies
  Applications
  Downloads
  Benchmarks
  Technical Papers
  Feedback
  www.djvu.com
  Search DjVu Zone
[Get the Plug-in]
[get the plug-in]
Introduction
-About this Tutorial
-What is DjVu
-The Many Faces of DjVu
-When to use DjVu, and what performance to expect?
-The Main Features of DjVu
Creating DjVu Documents: a Quick Introduction
-Working with DjVu Files
-Images, Resolution, and Scanning
-Overview of the DjVu Compression Software
-Compressing Individual Images
-Creating Multipage DjVu Documents
-Compressing Multiple Images into a Multipage DjVu Document
-Hyperlinks and Annotations
Publishing DjVu Documents on the Web
-Introduction: Getting Started
-Simple CGI-style arguments
-Displaying a DjVu Documents in a Frame
-Embeding DjVu Documents into HTML Pages
-Elements of Style
-Linking to the DjVu Web Site
-A complete Example using Frames
-A complete Example using Embedded Objects
Hosting DjVu Documents: Avanced Topics
-Triggering the Automatic Plug-in Download
-Automatic Installation of the Plug-in: how does it work?
-DjVu Display Attributes
-Attributes in the URL
-Scripting
-How Caching Works
Configuring your Web Server for DjVu
-Multipurpose Internet Mail Extensions (MIME)
-Configuring Your Web Server to Support DjVu
The DjVu File Structure
DjVu: A Tutorial

Creating DjVu Documents: a Quick Introduction

Working with DjVu Files

(under construction)

Images, Resolution, and Scanning

This section gives some basic information about document types, and appropriate resolutions for scanning and display. Readers familiar with these concept may skip the section

The punch line: bitonal, grayscale, or color documents should be scanned at 300DPI or higher. If the scanning software does not support DjVu natively, individual images should be saved preferably in an uncompressed file format such as TIFF, BMP or PBM/PGM/PPM.

The three basic "modes" of DjVu (DjVuText, DjVuPhoto, and DjVuLayered) correspond to three different types of images:

  1. black and white (bitonal) documents such as business documents, manuals, CAD drawings, scanned microfilms, etc. which will typically be available as TIFF-G4, BMP, or PBM files.
  2. continuous-tone images such as photos, scanned graphic art, etc.
  3. color documents such as magazines, catalogs, historical documents, etc.
These three types require different resolutions to look good on a screen or on paper, and can accept different compression ratios.

The resolution of computer screens generally varies between 70 and 100 pixels per inch depending on the screen size and the display settings. Therefore the notion of "dots per inch" (DPI) for digital images is somewhat ambiguous. In the following, we use DPI to mean "pixels per inch of paper". For example, a one inch by one inch document scanned at 300DPI will produce a 300 by 300 pixel image. This 300x300 pixel image displayed on a computer screen at full resolution may range in size between 3x3 inch and 5x5 inches, depending on the size of the monitor and the display settings. Scaled down by a factor of three to 100x100 pixels, the image will occupy approximately between 1x1 and 1.7x1.7 inches on the screen (which is close to the original size), but many details will be invisible since blocks of 3x3 pixels in the orignal scanned image will be crammed into a single screen pixel. Until display technology improves dramatically, our only option is to use viewer software that allows us to zoom in when we need to look at details and zoom out when we want to display the image at a scale that best approximates the physical size of the original (and that fits on the screen).

Bitonal Documents

Most bitonal documents must be scanned at least at 200 DPI to be readable (that's what most Fax machines do), but should be scanned at 300 DPI to ensure good quality on screen and on paper, and to ensure that small characters will be easily readable. 400 DPI will give an extra safety margin, and 600 DPI will ensure that a printout of the scanned document will be virtually indistinguishable from the original. Professional publishing applications may require 1200 or 2400DPI.

Since typical computer screens have a resolution of about 75 to 100 DPI, one might think that scanning a document at 100 DPI would suffice for screen-based applications. That is not the case: bitonal documents at that resolution have jaggy characters that are almost unreadable. Two solutions are possible: (1) scanning at 100 DPI in grayscale (not bitonal) to improve the image quality, or (2) scanning in bitonal at 300 DPI or more and relying on the viewer software's zooming capability to scale down the images and display them apropriately on a computer screen. The second solution is much preferable because it produces higher quality images, smaller files, and good quality printouts.

A typical 300 DPI page scanned in bitonal and compressed with DjVu will occupy 5 to 30KB. Compressing multiple pages at once generally produces smaller files because the DjVu compressors (such as the command-line tools bitonaltodjvu or documenttodjvu) are able to extract information that is common to multiple pages and store it once, rather than store a copy in every page. Large engineering drawings may produce files on the order of 50KB to 200KB. The maximum pixel size of a DjVu image is 32,000 by 32,000 pixels.

Photos and Pictures

Photos and pictures are often referred to as continuous-tone images because they contain smooth gradations of color and grayscales, unlike the sharp transition between white and black found in typical printed documents.

A 600x400 pixel photo fills up about one quarter of the screen of a modern PC, and produces a hardcopy of barely acceptable quality when printed at 100 DPI on a 6x4 inch sheet. While this may be sufficient for casual family pictures and internet distribution, it is insufficient for showing fine details of an object, or for giving an accurate rendering of a painting. Such images typically occupy 20 to 40KB when compressed into a DjVuPhoto files.

1600x1200 pixel photos, such as the ones produced by high-end consumer digital cameras (2.1 megapixels) will fill up the entire screen of high-end PCs (and overflow the screen of most), but they can be printed at 300 DPI on a 6x4 sheet or enlarged to 12x8 at 150 DPI with decent quality. Many details are present on such pictures, but not as much as on an enlarged traditional photo. Such images typically occupy 100-300KB when compressed into a good quality DjVuPhoto file. Compressing down to 50KB will produce files of acceptable quality but minor artifact may be present.

Higher resolution digital images are often scanned from negatives or slides using a slide scanner. Resolutions of 3200x2400 pixels (300 DPI when printed on a 12x8 sheet) can be achieved. Many details are visible at that resolution. DjVu's zooming/panning and on-the-fly decompression techniques are most appropriate for these images.

Still higher resolutions can be obtained with high-end digital studio cameras (up to 12000x10000), with professional scanners, or with geospatial imaging systems, but the high price of these imaging devices limits their widespread use. While such images can be compressed with DjVuPhoto, they may be better handled with LizardTech's MrSID format, which has no size limitations.

Color Documents

Color documents such as magazines, catalogs, or ancient documents generally contain text and pictures or textured backgrounds. DjVu works best when this type of document is scanned at 300DPI and higher. DjVu will extract the text and line drawings and compress them at the document resolution.

While the foreground/background separation process can function at a wide range of resolutions, it works best at 300DPI and higher. Low resolution documents may be upsampled prior to compression, but the result will not be as good as with 300DPI originals.

An 8.5x11 inches color page (magazine, catalog, manuscript) scanned at 300DPI on a color scanner at 24 bits per pixels (sometimes called "millions of colors" in scanning software), will occupy about 25MB uncompressed (in TIFF, BMP, or PPM formats). Compressing it with JPEG will produce a file between 300KB and 5MB depending on the quality setting. Compressing to DjVu with the DjVuLayered mode will produce files between 30KB and 100KB, of which 5 to 40KB will be used for the foreground layer, and the rest for the backgrounds and pictures.

Such high compression ratios come with a price: the foreground/background separation process may sometimes cause visual artifacts on certain types of document. There are applications for which perfect image quality is paramount, and disk space or download times are less important. Examples of such applications include archives of historical manuscripts that are not accessed very often, and document repositories for the publishing industry. In such situations, it is preferable to compress color documents in high-quality DjVuPhoto mode or with MrSID, rather than in DjVuLayered mode. The files will be significantly larger, but the quality setting can be tuned up as high as desired to quasi-lossless levels.

Digital Documents

Digital documents are document files that are produced electronically, (e.g. with a word processor), rather than by scanning paper. The DjVu compressor for UNIX can convert document from PostScript and PDF to DjVu (by calling a free software tool named "ghostscipt"). However, as of April 2000, no tool exist to convert directly from formats such as Microsoft Word or PowerPoint to DjVu. Such tools will be made available in the near future by LizardTech and its partners. In the meantime, it is possible to export (or "print" virtually) such documents to a PS or PDF file and convert the result to DjVu.

Overview of the DjVu Compression Software

In this section, we give an overview of the software packages produced and distributed by LizardTech (and AT&T). We do not describe the tools produced by DjVu partners and third party vendors.

The following compression packages are available from LizardTech as of October 2000.

  • DjVu command line tools: contains a comprehensive set of command-line tools for compressing, decompressing, and manipulating DjVu images and documents. The compressors can convert from many formats into DjVu. They include special commands for the DjVuLayered, DjVuPhoto, and DjVuText modes. Tools for manipulating and assembling multipage documents are also provided.
  • DjVu Bitonal command line tools: a set of command-line tools intended for users who only deal with bitonal documents. The compressor is optimized for converting TIFF-G4 files to DjVu (DjVu Bitonal is included as part of the full DjVu command line suite).
  • libdjvu++ 3.0: soon-to-be-released source code and command line tools for basic compression, decompression, and manipulations of DjVu documents.
  • DjVu Solo: With its simple graphical user interface, DjVu Solo allows Windows and Linux users to scan, compress, and view DjVu 3.0 documents. It also allows to add hyperlinks and other annotations to DjVu documents. While limited, DjVu Solo is simple to use and free for non-commercial use.
  • libdjvu++ 2.0: The AT&T open source DjVu reference library. It contains: a basic DjVuPhoto compressor called "c44" to convert PPM and PGM images to DjVuPhoto; a decompressor to convert DjVu 2.0 images into PBM/PGM/PPM; several utilities; and the source code of a full DjVu 2.0 decompressor, as well as a piece of the encoder (pretty much everything except the text/background segmenter and the smart and efficient DjVuText compressor).
  • DjVu Shop 2.0: With its simple graphical user interface, DjVuShop 2.0 allows Windows users to scan, compress, and view DjVu 2.0 document. It also allows to add hyperlinks and other annotations to DjVu documents. While limited, DjVuShop is simple to use and free for non-commercial use. DjVuShop 2.0 produces and displays 2.0 DjVu files only, not DjVu 3.0 files.
  • DjEdit: A GUI-based software for Unix that plays the same role as DjVuShop, though it does not include a compression feature. DjEdit can be used to assemble individual pages into multipage documents, and to add hyperlinks and other annotations.
  • DjVuMulti: command-line tools and C-language APIs for assembling and disassembling multipage DjVu documents (DjVuMulti is included as part of DjVu command line).

Compressing Individual Images

(under construction)

Introduction

a single bitonal page

photos

scanned magazine page in color

scanned manuscripts

screen dumps, palettized images

What if my image is in a format that DjVu does not support

Creating Multipage DjVu Documents

While DjVu is often used for individual pictures and single page documents, many content providers use it to distribute multi-page documents.

DjVu 3.0 supports three different models to store DjVu content, plus two obsolete models supported by the 2.0 plug-ins and viewers:

  1. Independent DjVu files, with one image per file (supported by all versions)
  2. The BUNDLED multi-page format: a multi-page DjVu document where all the pages are bundled in a single file (supported by the 3.0 plug-ins and viewers, not supported by the 2.0 plug-in nor by DjVuShop 2.0).
  3. The INDIRECT multi-page format: where each page and each shared shape dictionnary is stored in a separate file, and where the document file merely contains pointers to the individual pages (supported by the 3.0 plug-ins and viewers, not supported by the 2.0 plug-in nor by DjVuShop 2.0).
  4. The INDEXED multi-page format: an obsolete multipage format similar INDIRECT (supported by the 2.0 and 3.0 plug-ins and viewers and by DjVuShop).
  5. The OLD_BUNDLED format: an obsolete multipage format similar to BUNDLED (supported by the 2.0 and 3.0 plug-ins and viewers and by DjVuShop).
Using one the multi-page DjVu formats (INDIRECT or BUNDLED) has several advantages over using independent page files:
  1. the navigation can be handled by the plug-in through the navigation bar or the keyboard shortcuts. Navigation through individual (non-multi-page) DjVu files must be built with HTML, JavaScript, or Java.
  2. the plug-in will let the user save or print the whole document in one operation.
  3. the plug-in will prefetch and predecode the pages that follow the currently displayed page, thereby reducing the time it takes to flip a page.
  4. the overall size of multi-page documents is generally smaller than individually compressed pages because when compressing multiple pages, the compressor can extract information shared accross pages (such as the shapes of characters that appear frequently in the document) and store it once, instead having to embed a copy in every page.
  5. the plug-in is not "restarted" when flipping pages, which minimizes screen redisplays and accelerates rendering.
While the plug-in provides a default navigation mechanism for multi-page documents through the toolbar and keyboard shortcuts, it is also possible to use HTML or JavaScript to tell the DjVu plug-in to display a particular page of a DjVu document by adding CGI-style optional arguments on the URL, or by adding optional arguments to EMBEDded documents. A full description of this feature is given in subsequent sections. A reference document is available in the plug-in help pages.

BUNDLED vs INDIRECT

As indicated above, there are two types of multi-page DjVu documents: BUNDLED and INDIRECT. In the BUNDLED format the whole document is packed in a single file. In the INDIRECT format, each page is in a separate file (generally all residing in a single directory). INDIRECT documents are accessed through a document file (or index file) that contains pointers to each individual file composing the document. The advantage of the BUNDLED format is that it is easy to manipulate (copy, rename, mail...). It is the best option for DjVu documents that are accessed on a local hard drive, or through a fast network connection. The disadvantage for web-based applications is that the pages of a BUNDLED document are downloaded sequentially, there a page cannot be viewed until all the previous pages have been downloaded. The INDIRECT format solves this problem. Because the pages of a document in the INDIRECT format are stored in separate files, they can be accessed on demand in any order, without requiring a so-called "byte server".

From the user's point of view, there is no real difference between the INDIRECT and BUNDLED models (except for the faster random page access of the INDIRECT mode). The plug-in allows to users to print, save, and search the whole document in a single operation with both models. The "save document" feature of the plug-in also allows to convert from one format to the other.

For Internet-based applications, we recommend the INDIRECT format. For Intranet or Local Area Network based applications, either the BUNDLED or INDIRECT formats can be used.

Remember that the BUNDLED and INDIRECT formats are supported by the plug-in and viewers 3.0 and above. Tell your users to upgrade. (Using the image/x.djvu Mime type for your content will tell 1.0 and 2.0 plug-in users to upgrade).

Assembling DjVu Images into Multipage Documents

The batch compressors distributed as part of the DjVuText and DjVuLayered packages can directly produce multipage DjVu file when fed with multiple input files. The files produced are smaller than if the pages are compressed separately because the compressor can extract and share redundant information accross multiple pages.

Individually compressed DjVu pages can be assembled into multipage documents using the free package DjVuMulti. To assemble a bunch of DjVu images into a single BUNDLED document simply type:

djvubundle page1.djvu page2.djvu.... pageN.djvu document.djvu
To assemble a bunch of DjVu images into an INDIRECT document, type:
djvujoin page1.djvu page2.djvu.... pageN.djvu documentdir/index.djvu
where documentdir must be an existing directory where all the individual page files will be copied.

To disassemble a BUNDLED document into an INDIRECT one, simply say:

djvujoin document.djvu documentdir/indexfile.djvu
To convert a multipage document from one of the old 2.0 multipage formats, do
djvureindex olddocument newdocument
The programs djvujoin, and djvubundle supersede the 2.0 programs djvuindex and djvumerge.

Compressing Multiple Images into a Multipage DjVu Document

bitonal multipage TIFF-G4 document

One or several multipage TIFF-G4 files can be converted into a single DjVu document with:
bitonaltodjvu g4file1.tif g4file2.tif document.djvu
This will create a BUNDLED DjVu document.

multiple scanned pages in color


documenttodjvu page1.tif page2.jpg page3.bmp document.djvu

PostScript documents

The UNIX versions of bitonaltodjvu and documenttodjvu can accept PostScript or PDF file as input and convert them to DjVu. They call the free software tool "gs" (or GhostScript) to transform the document into page images, and then simply compress those images. Here is an example:
documenttodjvu document.ps document.djvu
Another possibility is to convert the PostScript file to image files separately using "gs", and then call one of the regular DjVu compressor documenttodjvu or the free tool cpaldjvu distributed with libdjvu++ 3.0. Here is a simple tcsh script that turns a PostScript or PDF page into DjVu:
!/bin/tcsh set tmp=/tmp/$1:r.$$.ppm gs -q -r300 -dNOPAUSE -dBATCH -dSAFER -sDEVICE=ppmraw -sOutputFile=$tmp -- $1 cpaldjvu -dpi 300 $tmp $1:r.djvu rm -f $tmp

Hyperlinks and Annotations

(under construction)




DjVu Zone Search DjVu Zone Feedback
What's New What is DjVu Tutorial Documentation Digital Libraries DjVu on the Web
DjVu Companies Applications Downloads Benchmarks Technical Papers www.djvu.com

Copyright 1998-2000 AT&T All rights reserved.
DjVu and the LizardTech logo are trademarks of LizardTech Inc.
DjVu, document imaging, image compression, scan, image, document, web,image processing, digital library, electronic commerce, legacy document, plug-in, JPEG, TIFF, PDF
DjVu: The Technology for Scanned Documents on the Web
technology document image compression innovation,
scan high-resolution page color 300 dpi,
fast download of scanned documents, as fast as html,
zcoder, z-coder, binary adaptive arithmetic coding, wavelet, pattern matching.
download, free, non-commercial, plug-in, plugin, compressor, wavelets