Why public libraries should digitize local history
It is part of the service mission of the library to protect
the community's collective memory, the skills to do so are easily
acquired, the equipment is either already available or inexpensive to purchase, libraries own or can
readily obtain the rare items, no one else will bother, the number of available items is
finite and it will benefit the community.
Copyright
This is a concern, but should not paralyze you. Susan Kornfield has produced a guide
for libraries digitizing. Some important points: copyright is not forever,
it does NOT exist to protect the author, copying or digitizing does not confer
copyright and libraries acting in good faith have some safeguards.
Standards and Best Practices
The goal of standards and best practices is to handle the item once. The
problems with standards are: they will change, they offer only temporary
protection and they are beyond the means of most
public libraries. The answer may be
responsible minimalism. Don't let the perfect be the enemy of the
possible.
Access and Preservation
To digitize a document is different than preserving it, the goal of
preservation being to provide access to an original item. Digitization
complements preservation by protecting the original and providing far superior access.
Some Concepts and Definitions
File Formats and Display Types
Masters and derivatives - A master file is saved in the highest quality
possible. Derivatives are files created from masters for display purposes.
TIFF or TIF (Tagged Information File Format) - A widely-used bitmapped
graphics file format handles monochrome, gray scale, 8-and 24-bit color. The
masters of scanning files are often saved in this format.
GIF (Graphics Interchange Format) - A popular bitmapped graphics
file format which is widely used on the
Web, because the files compress well, but losslessly.
JPEG or JPG (Joint Photographic Experts Group) - A format that is
becoming very popular due to its high compression capability. It provides lossy
compression. The rate of compression can be controlled, resulting in high
or low quality images.
Thumbnails - Small versions of an image that are linked to a
larger
version.
TXT (plain text) - Text that is in a raw (ASCII) form. It includes no
formatting, making it very portable.
HTML (HyperText Markup Language) - The document format used on the World
Wide Web. HTML files are small, making them easy
to store, transmit and download. Browsers allow the end user to select the typeface and
font size for displaying HTML, so that it is user friendly and accessible. HTML which includes graphics is sometimes called Illustrated HTML.
PDF (Portable Document Format) - A format that allows documents created
on one platform to be displayed and printed exactly the same on another. Adobe
Acrobat Reader is free software used to read and display PDF files.
Page Views - Pages can be presented as a graphic. GIF is often
used.
Scanner jargon
DPI (Dots Per Inch) - The measurement of the resolution of
printing and display systems. It is a square measurement. The
number of dots helps determine file size.
Bi-Tonal
- Scanning done in black and white is called bi-tonal. Also
referred to as line drawing.
Grayscale - A series of shades from black to white, usually 256 of
them.
Color - Color usually ranges from 256 shades (8 bit) to 16.7 million
shades (24 bit).
OCR (Optical Character Recognition) - After a page of text has been
scanned, OCR software examines the resulting dots, converts the dots to letters and so recreates the
text.
Other definitions can be found at TechEncyclopedia.
Types of Digital Projects
An Existing Digital Document. Local historians, genealogists and others
often have interesting documents already in electronic format. These documents
can be easily converted to HTML for web publication. The Index
to Elsie's Scrapbook and Lincoln High
School Graduates 1904-2000 are examples of this kind of document.
Retyping an Older Document. An older document can
simply be entered into a word processing program, checked for accuracy,
converted to HTML and then web published. A
Short History of Wisconsin Rapids is an example of this method. This mimeographed
document from the 1950s was retyped by library staff. This method is suitable for text-only
documents, where original format is less important than the content.
Scan and OCR. Text can be digitized using OCR. The resulting text will need to be proofed thoroughly, especially if the original is
not laser quality. This proofing can be more time consuming than simply
re-entering the document. Graphics can
be scanned separately and combined with the text. If the text is used to create
HTML, the resulting files are small
and can be viewed in any browser. Since this method changes the format of the
document, it is best used when the format is relatively unimportant. Centennial
Story 1890-1990 is an example of OCR text with a separate graphic section, which
reflects the original format. Each chapter of the
book has been placed in a separate file for ease of access. The Appendix was updated to include
additional information.
Page views. When the format or feel of the original document is
important, page views allow that to be replicated. Since the master TIFF files
are difficult to display, smaller GIF files or some other format is used for
display. The Making of America
site uses page views, with a large custom database to manage them.
Adobe Acrobat. If the original format of a document is important, Acrobat
(not Acrobat Reader) can be used to replicate it. A page is scanned, usually creating a
TIF file. Acrobat converts the TIF file into a PDF file. Acrobat can also take a series of scanned pages and combines
them in one document (file) with important advantages in terms of display. Acrobat sells for about $250 and is a
sophisticated program, but not beyond the ability of a dedicated amateur.
Wood County place names is a 130 page book scanned bi-tonally at 600 dpi. It is available as a 21 MB file or broken into sections. Acrobat gathers the scanned pages together and retains the flavor of the original, but the resulting file is much larger than an unformatted HTML file of the text would be.
Rules and Regulations of the T. B. Scott Free Public Library, a four page pamphlet, was scanned in grayscale because it was originally printed on colored paper. The format of the original is an important part of the charm of this document, warranting its publication in PDF format. The text of this document would require only 17K in plain HTML, but takes 1.7 MB in grayscale TIFF or PDF.
Official Historical Program - Wood County Centennial is a 28 page document with dozens of photographs scattered throughout the text. Its small size made it appropriate for in-house digitization. The text was scanned bi-tonally, with grayscale photographs pasted in. Acrobat was used to gather the scanned pages.
Scan and Post. If an item is mainly graphical (such as photographs), the
graphics can be scanned and web published. Scanners are inexpensive and usually
include graphic software that will help clean up the files and save them in the
best file format. The Young
Postcards are an example of this method. Note that thumbnails (small
graphics linked to the larger version) have been used to make it easy to browse.
Databases. Databases can be used to track and display photographs. Kansas
City Public Library uses a database to provide access to its collection of
14,000 photographs. This is too large a collection to be browsed, so databases
organize it for display. Collections of data such as obituary indexes and cemetery
records are also candidates for digitization. Identifying copyright can be a problem,
since these kinds of records are usually compiled by groups. McMillan has loaded
databases as static pages, but it is only slightly more difficult to load a
database as a dynamic document.
Index to the 1928 Standard Atlas of Wood County, Wisconsin / compiled by Marlys Manley Steckler. This was originally a database created with Excel. Simply using SAVE AS HTML created the static pages.
Stevens Point Area Obituary Index. This on-line database can be searched and updated.
Professional services. Some jobs are too complicated and/or time consuming for library staff. Artwork of the Wisconsin River Valley, part 1 is part of a photographic series originally published in nine parts. Due to its combination of text and graphics, it was sent to a commercial vendor (Northern Micrographics) for conversion. The cost of the project was funded by a grant by Consolidated Papers Foundation, Inc.
Out-Source or In-House?
When a library uses an outside professional:
When a library does a project in-house:
What Next?
Once you have a digital text, there are two main options to provide access:
Digitizing on a Budget (or without a Budget)
Many public libraries that might want to digitize documents has little or no budget for the process. They do not have to sit on the sidelines and wait for some indefinite future when funding is plentiful or digitization is free. It is possible to move forward without committing significant staff time or funding.
Use someone else's money. Local foundations are a good source of funds for this kind of project. After all, they (or their ancestors) loom large in the history of the community. Develop a limited proposal and outsource it. This involves a minimum of staff time and can include several entire books. Grants as low as $1,000 can generate positive results. LSTA money is also currently available for digitization.
Use someone else's time. As mentioned above, some digital projects are as simple as using SAVE AS. Others in your community have done the work and are looking for someone to provide web space, stability and publicity. Genealogy and historical societies are great places to recruit volunteers for OCR projects. Schools have classes that study local history, writing and computer science. It only takes one teacher to bring a whole class of volunteers with her. Teens can also be involved individually as volunteers.
Fit it in along the edges. Scanning in a set of photographs is something that can fill a slow afternoon. Typists can retype a valuable document in a few minutes. Staff can proofread text to stay busy at desks. Employees on light duty can be trained to scan photographs. Even if a project is assigned a low priority, it will eventually get done.
Projects do not need to be massive. In most cases, anything you do will be the largest such project in your community. The more you do, the more support and assistance you will find. Demonstrate the library's interest and competence. Additional opportunities will present themselves.
Recommended Internet Sites
Digitization in Public Libraries Web
Site - Handouts and advice from the 2001 PLA Spring Symposium on
digitization. Be sure to check the Resource
list.
The On-Line Books Page -
Probably the most complete of several attempts at listing on-line books, with
over 13,000 listings.
This page was prepared by Andy Barnett, Assistant Director of McMillan Memorial Library. All examples used are from the Library's collection of digital historical titles. He welcomes comments and suggestions.
This page is located at http://www.mcmillanlibrary.org/programs/digitize.html
Last updated August 30, 2006