The process of digitizing archival documents for presentation on the web requires a diverse team of skilled individuals to photograph, transcribe, and encode each document.
The collections are imaged following industry standards for the digital capture, processing, and archiving of archival documents by the digital imaging team.
Master archival image files are created in full colour (24 bit RGB) at a resolution of 300 dots per inch (dpi). Tonal scale and colour balance controls are set prior to image capture to create digital surrogates that are true to the appearance of original documents. Files are sharpened during image processing, as needed, with an unsharp mask algorithm to achieve the approximate appearance of the original. Master image files are stored as uncompressed TIFF files (Intel byte order, header version 6).
To improve networked access to the images, reduced-resolution web surrogates are derived from the master archival TIFF files. Thumbnail- and full-size web surrogates are created with a resolution of 72 dpi and stored as full colour (24 bit RGB) JPEG files.
Master images (TIFFs) are archived to CD-R while web surrogates (JPEGs) are uploaded to a Unix/Apache web server subject to a nightly backup process.
Descriptive metadata is created at document and component image levels according to the Electronic Text Centre's extended Dublin Core metadata schema. The schema follows a Dublin Core framework with relevant terminology standards and controlled vocabularies to create rich and highly portable metadata records.
Using a word processing application, transcribers type document text following editorial guidelines developed to assist automatic XML encoding of the resulting transcriptions. Transcriptions are proofread using the two-person, read-aloud technique.
The initial encoding of document transcriptions is automated with a Perl script that maps textual structures and features identified in the transcription text to TEI elements. Once mapped, the transcribed text is encoded with the appropriate TEI markup.
Project encoders then proofread and edit the initial XML encoding using oXygen XML Editor. In addition to correcting errors and omissions in the document text or markup, encoders have several main tasks:
Encoders ensure all metadata fields are completed correctly. Included in each XML document is descriptive metadata describing the document itself, the source document upon which it is based, and conventions used in its encoding.
Using image galleries and metadata created by the digital imaging team, encoders locate and link source document images to the encoded text.
Encoders assign unique keys to personal and geographic place names to allow names with variant spellings, identified by the archive research team, to be consistently indexed.