Home

What is Digitization?

WRLC's Digitization Process

Other Digitization Services

Tower Collection Information

Olive Proposal

Conclusion

Presentation

Resources


Digitizing CUA's Student Newpaper


Digitization and Best Practices

What is Digitization?

Over the years, technology has advanced to allow objects, materials and resources to be represented or converted into a digital form. One can turn the pages of an e-book on a Kindle or listen to live recordings of folk songs from the 1960's on the Internet Archive's Wayback machine. This wasn't possible 10 or 15 years ago, but access to information through evolving technologies is possible in this digital age. But how do these objects, materials and resources - once in an analog or tangible form - become digital?

Digitization is the process of converting information or data into a digital format in order to be processed by a computer. The purpose of digitizing objects, materials and resources is to provide electronic access to information for people who would not normally have access to the original, to easily share information, and to preserve data over the long-term. Once in a digital format, objects can be stored on a computer, manipulated, transmitted electronically, printed, or displayed on a computer monitor or television screen. Text or images (photographs, illustrations, maps and manuscripts) are scanned using machines or digital cameras that convert the information into a binary signal. In layman's terms, this means the computer can display the original information in digital representation on a computer monitor. Computers use binary signals as a type of language to translate the parts of the digitized object into a recognizable digital format. Digitization of audio and video materials (moving images, musical recordings) requires a different set of machines and devices to convert analog files, such as video and cassette tapes, into digital format.

In libraries, museums and archives, large and small digitization projects have been undertaken for a variety of reasons on any number of collections in different formats. During the past few years, the most well-known mass digitization project is the Google Book Search. Google has partnered with a number of academic institutions to digitize entire collections and holdings. Once digitized, books that are out of copyright or in the public domain are available through the universities' representative catalogs as well as on Google Book Search. This mass digitization project is far different from a small scale project such as The Towers project, but it gives some indication that electronic access to information once only in analog form is extremely important in this digital age.

Digitization Best Practices and Standards

Given this project is geared toward a specific collection and type of format, the following information is a survey of some best practices and standards for digitizing text and photographs. If applicable, the practice or standard is compared to the WRLC digitization program. See Resources for the specific institutions and digitization programs surveyed.

  • It is highly recommended to make the best master image possible and make all further images from this one master image. Studies show that once scanned, even if the image turns out to be insufficient for their purposes, a library or archive will not rescan. Also, scanning exposes the object to 4 times intense a light as photocopying and repeated scanning could damage the material.

  • Some of the newspapers are bound together; bound objects can create an obstacle when scanning if they do not open wide enough, or if the binding is too close to the text.

  • Digitizing microfilm is more cost-effective than digitizing print material. This is only true, however, if the microfilm is a good quality version of the original. It is best to use silver positive copies from the master negative.

  • Master images should be made with a high resolution and stored in an uncompressed format, usually TIFF. Images that will be used for access should be made from this master image. These access images will have a lower resolution. Because of the high resolution, the master images will need a great amount of storage space. WRLC seems to collaborate with the school on the issue of storage.

  • CD-R is an inexpensive and portable format for archival storage, but is also easily susceptible to damage.

  • Network servers are a recommended non-portable format for digital preservation.

  • WRLC uses one color flatbed scanner (EPSON Expression 1640XL- Graphics Arts, 42-bit color up to 1600 dpi) and one grayscale flatbed scanner (Fujitsu M3096GX, 256 grayscale up to 400 dpi). Flatbed scanners are the most widely used for various reasons, but keep in mind that flatbed scanners can have the tendency to streak.

  • Since newspapers are characteristically not in a simple 8.5 x 11 form, it is extremely important to keep their size in mind because the size of the object being digitized affects the digital image. Looking at the newspapers that WRLC has already digitized, it appears as though they make intermediate reductive copies of the material, focusing in on individual sections of the paper. This may require more intricate and therefore more expensive equipment.

  • A master text image should be scanned at 200-300 dpi grayscale and stored uncompressed TIFF Intel (IBM) byte order and bit depth of 8. The access image should be 8 bit grayscale JPEG 4-6 on 1/10 scale (medium) and a file resolution of 200 dpi unaltered image size.

  • A master photograph image should be scanned at 4000 pixels on long side or 600 dpi and stored uncompressed TIFF Intel (IBM) byte order. A color scan RGB color 24 bit and black and white scan 8 bit grayscale. The access image should be 8-12 bit grayscale or 24-36 bit color, JPEG 8-10 on a 1/10 scale (high) and a file resolution of 300 dpi unaltered image size. A thumbnail image should be 4 bit grayscale or 8 bit color, JPEG 4-5 on a 1/20 scale (medium) 72 dpi.

  • If images are used strictly for web access, the quality does not need to be as high as it does if you are interested in archival preservation.

  • EAD is the recommended metadata standard for finding aids, while Dublin Core is the baseline standard that is used to describe a plethora of resources. WRLC is in compliance with both of these recommendations.

  • A plan for digital preservation should be made. The digital collection should be backed up in a remote location separate from the database being used in case of disaster. Digital migration should be planned on a scheduled interval or needs basis in the most cost-effective manner.

  • Preservation metadata should be maintained for each image with an established file naming convention that can be extended with unique ID numbers or accession numbers.