[DjVu Zone]
  DjVu Zone
  What's New
  What is DjVu
  Tutorial
  Documentation
  Digital Libraries
  DjVu on the Web
  DjVu Companies
  Applications
  Downloads
  Benchmarks
  Technical Papers
  Feedback
  www.djvu.com
  Search DjVu Zone
[Get the Plug-in]
[get the plug-in]
Introduction
-About this Tutorial
-What is DjVu
-The Many Faces of DjVu
-When to use DjVu, and what performance to expect?
-The Main Features of DjVu
Creating DjVu Documents: a Quick Introduction
-Working with DjVu Files
-Images, Resolution, and Scanning
-Overview of the DjVu Compression Software
-Compressing Individual Images
-Creating Multipage DjVu Documents
-Compressing Multiple Images into a Multipage DjVu Document
-Hyperlinks and Annotations
Publishing DjVu Documents on the Web
-Introduction: Getting Started
-Simple CGI-style arguments
-Displaying a DjVu Documents in a Frame
-Embeding DjVu Documents into HTML Pages
-Elements of Style
-Linking to the DjVu Web Site
-A complete Example using Frames
-A complete Example using Embedded Objects
Hosting DjVu Documents: Avanced Topics
-Triggering the Automatic Plug-in Download
-Automatic Installation of the Plug-in: how does it work?
-DjVu Display Attributes
-Attributes in the URL
-Scripting
-How Caching Works
Configuring your Web Server for DjVu
-Multipurpose Internet Mail Extensions (MIME)
-Configuring Your Web Server to Support DjVu
The DjVu File Structure
DjVu: A Tutorial

Introduction

About this Tutorial

This tutorial is primarily intended for Web designers, Webmasters, and individual users who want to create DjVu content and publish or host it on their Web site. The document contains basic information about the DjVu technology and software, and detailed information on how to author web sites that include DjVu material. Specific topics discussed include: using HTML, JavaScript, embedded objects, and Frames to present DjVu content; how to trigger the plug-in autoinstallation procedure; how to configure web servers to serve DjVu content; and how to deal with the various differences and peculiarities of the various browsers on the various platforms. Although this document is not intended as a tutorial on how to use the compressor tools to create DjVu images, it does provide some basic information about that too.

This tutorial is written with DjVu v3.x in mind. Some of the features described in here may not be available with earlier versions of the DjVu viewers and compressors. Because many applications of DjVu are high-volume, produced with the "batch" compressor on server machines, much of the discussion on creating DjVu documents concerns the command-line programs provided as part of the DjVu compression tools, as well as the tools distributed with the open source DjVu Reference Library (libdjvu++ 2.0 and 3.0). The GUI-based tools for the desktop available LizardTech (DjVu Solo) or from third parties are only discussed briefly.

What is DjVu

Many products and technologies are available today for storing and distributing digital images. However, most of these technologies are slow and inconvenient for distributing high-resolution color photos, and woefully impractical for delivering high-resolution scanned documents in B&W and color, because the files they produce are prohibitively large, and the viewing software memory hungry.

With its ability to attain very high compression ratios while preserving crisp and legible text, DjVu has no rival when it comes to distributing scanned documents, particularly color documents. For this kind of applications, there is quite simply no practical alternative to DjVu.

DjVu is also an excellent format for distributing high-resolution photos, because its wavelet-based continuous tone image compression technology produces small files with very fast progressive display, seamless zooming and panning, and requires minimal memory in the client.

The Many Faces of DjVu

To handle single-page or multi-page scanned document in color or black and white, and high-resolution photos and pictures, DjVu wraps three compression formats into one. In this tutorial, these three formats will be referred to as DjVuText, DjVuPhoto, and DjVuLayered.

  • DjVuText (also known as JB2): a compression format for black and white (bitonal) image, or for images containing objects of mostly uniform colors (such as color text and line drawing). It achieves high compression ratios by taking advantage of the similarities between the shapes of the objects (characters or graphics) that appear multiple times in a document. Multi-page bitonal documents at 300 dots per inch (12 points per mm) with mostly printed text are typically compressed to 5 to 20KB per page. Images of larger formats or resolution, and images with handwriting, drawings, pictures, halftone, or high noise levels may produce larger file sizes. DjVuText produces files that are typically 3 to 10 times smaller than TIFF or PDF, both of which use the CCITT GroupIV compression standard internally. DjVuText also supports a mode in which each object can have a color associated with it. This component color mode is appropriate for documents that have been produced electronically, or images that would traditionally be compressed with GIF.
  • DjVuPhoto (also known as IW44): a progressive compression format for color or greyscale photos, paintings, and other continuous-tone images. It is based on the mathematical theory of wavelets. File sizes are typically about 1/2 that of JPEG for a similar quality. The main advantages of DjVuPhoto over JPEG are:
    • Small files: about 1/2 the size of JPEG for a similar quality. In addition, the compression ratio can be pushed up much higher than JPEG without JPEG's catastrophic artifacts.
    • Progressivity: images appear very quickly on the users display, and get refined as more bits arrive and are decoded.
    • Zooming and Panning with on-the-fly decompression: allows very large images to be displayed on PCs with limited memory. The image is kept in a partially decompressed, compact form in the client machine. Only the portion of it actually shown on the screen is decompressed and rendered on the fly. 4000x4000 pixel images can be displayed on machines with 32MB of RAM without disk swapping.
    While this technique allows DjVu to handle large images, the maximum image size is nevertheless limited by the amount of RAM on the client machine. Another limitation of DjVuPhoto is that it supports only one internal color model (called YCrCb). These limitations may be a problem in certain applications in the geospatial imaging, medical imaging, and printing industries. Other techniques, such as LizardTech's MrSID format are more appropriate in these cases.
  • DjVuLayered (or simply DjVu): applies to documents scanned in color or grayscale that contain mixtures of text, drawings, pictures, and background textures. Examples of such documents include historical documents, manuscripts, magazines, catalogs, comics, etc. DjVu achieves very high compression ratios by separating those images into multiple layers that are compressed separately using the most approrpiate method at the most appropriate resolution. DjVuLayered generally segements the document into 2 or 3 layers. The background layer contains the pictures and background textures and is coded with DjVuPhoto. The mask layer contains the text and line drawings and is coded with DjVuText. The color of the text and line drawings can be coded in two different ways, either using the "component color mode" of DjVuText, or as a separate foreground layer coded with DjVuPhotos. The mask layer is kept at full resolution to preserve crisp legible characters, while the background and optional foreground layers are generally coded lower resolution than the mask because the images and backgrounds do not require as much resolution to look good on screen and on paper. With these techniques, very high compression ratios can be achieved while preserving the sharpness and readability of the text. File sizes for a color magazine or a catalog page at 300 dots per inch (12 points per mm) are typically 40 to 100KB per page. Typical sizes for manuscripts or ancient documents typically vary between 30KB and 200KB. This is typically 5 to 10 times smaller than what JPEG would produce at the same resolution. DjVuLayered also uses the previously described on-the-fly decompession method. DjVuLayered can be used for scanned documents as well as for documents produced electronically in the first place.

A DjVu document may contain one or several pages. Plug-ins and viewers provide basic mechanisms for navigating between the pages of a multipage document. Multipage documents come in two flavors: BUNDLED, where the whole document is packed in a single file, and INDIRECT, where individual pages are stored in separate files in a directory. The relative advantages and disadvantages of these two formats are explained in later section of this tutorial.

IMPORTANT NOTE: multipage documents produced by version 2.x and version 3.x of the DjVu software are not compatible. The 3.x plug-ins will properly display 2.x multipage documents. Users of pre-3.0 version of the plug-in must upgrade to view 3.x multi-page document. It is also worth noting that, DjVuShop 2.0 can only read 2.0 style multipage documents.

Each DjVu image or page may contain additional chunks of data that include such information as the resolution of the image, hyperlink definitions, highlighted area definitions, default zoom factors, frame color, and display mode. A DjVu image can also include a compressed text chunk which contains the text on the page in a computer readable format (produced by an OCR software for example). When present, this text chunk is used by viewers and plug-ins to search and highlight words and phrases in DjVu documents.

When to use DjVu, and what performance to expect?

The short answer to the question "when to use DjVu" is: Use DjVu for:

  • any image with more than 1 million pixels (high resolution photo or picture, scanned document).
  • any image with text on it (scanned document in color or black and white, digital document)
  • any multipage content, such as a multipage document, or a photo album.

A short series of tables that compare the performance of DjVu with other formats is available here. The following paragraphs discuss the relative merits of DjVu versus other popular formats.

DjVu vs TIFF-G4 and PDF

the punch line:: Scanned black and white documents in DjVu (using DjVuText) are between 3 and 10 times smaller than in TIFF-G4. The most widely used format for archiving scanned bitonal (black and white) documents is called TIFF-G4. TIFF-G4 uses the "CCITT Group IV" lossless compression algorithm (the same algorithm used in FAX machines), and encapsulates it in a TIFF file format.

TIFF-G4 is not supported natively by web browsers, so a plug-in must be used to view documents in that format. Because TIFF-G4 was primarily designed for storage, and not for web distributions, it does not support web-friendly features, such as hyperlinks (at least not in a vendor-independent way). The G4 compression technology is now quite old and outdated.

When scanned documents are converted to PDF, the pages are merely compressed using CCITT GroupIV and encapsulated into a PDF file structure (instead of TIFF) without being recompressed. Therefore PDF file sizes for bitonal scanned documents are at least as large as TIFF-G4.

Bitonal documents can be compressed to DjVu in lossy mode or lossless mode. In lossy mode, DjVu documents are between 3 and 10 times smaller than with G4. Cleanly scanned multipage documents mostly containing text will compress close to 10 times better than with TIFF-G4, because DjVu can take advantage of repeating character shapes in the documents. For other documents (single page documents, noisy documents, low resolution documents, documents with lots of drawing or halftone pictures) the advantage over G4 will typically be between 3 and 5 times, and occasionally as low as 2 times. DjVu also has a lossless bitonal mode which produces files typically about half the size as TIFF-G4. However, the "lossy" mode is visually lossless, and should be used for most applications.

DjVu vs JPEG

As stated above, DjVuPhoto produces files that are about 1/2 the size of JPEG for a similar quality. A live comparison of JPEG and DjVu is available here.

For lack of a better standard until now, scanned color documents are sometimes compressed with JPEG. Such images at 300 dpi are typically 300KB to 2MB (and often more), and take a lot of memory and a very long time to decode and display an a standard PC configuration. For these reasons, JPEG is rarely used for images on the web larger than about a million pixels (or resolutions larger than about 100 dpi).

By separating the layers, DjVu can keep the text at 300 dpi, while downsampling the backgrounds at 100 dpi. File sizes are typically 30 to 200KB or 5 to 10 times smaller than JPEG.

DjVu vs PDF and PostScript

As we said above, for scanned documents, particularly color documents, there is simply no practical alternative to DjVu. While PDF can be used for scanned documents the file sizes are large for bitonal documents, and totally impractical for color documents.

The case of digitally produced documents is slightly different. DjVu can replace Adobe's PDF and PostScript advantageously in certain cases. The advantages of DjVu over PDF are:

  • much smaller file size than scanned documents encapsulated in PDF
  • smaller file size than digitally produced PDF if the document contains pictures.
  • total portability/compatibility (no font problems).
  • somewhat smaller file size for purely textual digital documents.
  • better integration with the Web's navigation paradigm
  • fast display, panning and zooming.
  • very lightweight plug-in/viewer (600KB for DjVu versus 6MB for Acrobat Reader) with self-installation capability.
Various tools are available to convert PDF or PostScript documents to DjVu. They are described in the compression section.

The main advantage of using DjVu over PostScript (or gzipped PostScript) for distributing digital documents are:

  • smaller file size than PostScript if the document contains pictures
  • integration with the Web's navigation paradigm (no need to download the document before viewing it).
  • immediate display with fast rendering, panning and zooming.
  • high-quality rendering of text on computer screens.
  • no compatibility problems accross platforms (no fonts, headers, or memory problems).
  • multiplatform viewers with effortless installation procedure.
While PostScript is apropriate for "download and print" applications, DjVu is far superior when it comes to browsing and reading directly from the screen.

DjVu versus GIF

For small images and logos with a limited number of colors, the free DjVu compressor "cpaldjvu" distributed with the libdjvu++ 3.0 open source library will produce files that are about half the size as with GIF. Although this mode can be used advantageously to embed small images into web pages, the absolute size reduction (in terms of KB) will be small. So this may not be worth the trouble unless the original GIF is over 30KB.

cpaldjvu can also be used to convert digitally produced document to DjVu (more on this in the compression section).

The Main Features of DjVu

(under construction)




DjVu Zone Search DjVu Zone Feedback
What's New What is DjVu Tutorial Documentation Digital Libraries DjVu on the Web
DjVu Companies Applications Downloads Benchmarks Technical Papers www.djvu.com

Copyright 1998-2000 AT&T All rights reserved.
DjVu and the LizardTech logo are trademarks of LizardTech Inc.
DjVu, document imaging, image compression, scan, image, document, web,image processing, digital library, electronic commerce, legacy document, plug-in, JPEG, TIFF, PDF
DjVu: The Technology for Scanned Documents on the Web
technology document image compression innovation,
scan high-resolution page color 300 dpi,
fast download of scanned documents, as fast as html,
zcoder, z-coder, binary adaptive arithmetic coding, wavelet, pattern matching.
download, free, non-commercial, plug-in, plugin, compressor, wavelets