Atlantic Canada Virtual Archives
About ACVA: About the Project
Return to About ACVA
The production team for the Atlantic Canada Virtual Archives (ACVA) was divided into groups according to digitization and production tasks: Imaging, Transcription, Text Encoding, Text Storage and Manipulation (including database design), Instructional Design, Graphic Design, Web Development and Project Management.
Digital ImagingText Transcription, Markup, and Manipulation
Instructional Design
Web Design
Digital Imaging
The project team imaged the collections following applicable industry standards for the digital capture, processing and archiving of archival documents:
Master archival image files - Master archival image files are created in full colour (24 bit RGB) at a resolution of 300 pixels or dots per inch (ppi/dpi). Following the recommended best practices of leading cultural heritage preservation institutions (for example, Cornell University, National Archives and Records Administration), tonal scale and colour balance controls were set prior to image capture in order to create digital surrogates that are true to the appearance of original documents and to minimize adjustments during processing. Files are sharpened as needed during image processing to achieve the approximate appearance of the original. All sharpening was done with an unsharp mask algorithm. Master image files were stored as uncompressed TIFF files (Intel byte order, header version 6). File naming follows established conventions at the University of New Brunswick for effective management of digital image collections.
Web surrogate files - Web surrogates for use in on-line delivery are derived from the master archival TIFF files. The format for these files is JPEG (24 bit RGB), a flexible, compressed format and recognized industry standard for the Web presentation of textual and photographic documents. In order to improve networked access and use of the images, resolution was reduced to 72 dpi. Additional surrogates in the form of thumbnails are used for Web access. Thumbnails are in JPEG format at a resolution of 72 dpi with reduced dimensions of 150 pixels in width for landscape images and 150 pixels in height for portrait images. As with the master archival image files, file naming follows established conventions for effective management of University of New Brunswick digital image collections. The university has ensured that image files can be identified with a persistent URL to enable reliable citation, cross-linking, and integrated access.
Image archiving - Master images (TIFFs) are archived to CD-R (3 copies each - 1 for the Nova Scotia Archives and Records Management and two for the University of New Brunswick) while surrogates (JPEGs) have been uploaded to a Unix (Linux) Server running Apache Web server software.
Cataloguing and metadata creation - Cataloguing and metadata descriptions have been created at collection, document and component image levels according to the Electronic Text Centre's extended Dublin Core metadata schema. The project follows a Dublin Core framework with relevant terminology standards and controlled vocabularies in creating rich and highly portable metadata records.
Metadata repository: database design and implementation - Project cataloguers have created metadata descriptions using custom Web-accessible editors that interface with a MySQL database. MySQL is an open-source database designed for speed and flexibility in heavy load use. The ETC's MySQL image database resides on a Unix (Linux) Server running Apache Web server software and is used for storing and delivering Dublin Core-compliant metadata records as well as linking them to associated image files.
Text Transcription, Markup and Manipulation
The project team transcribed and encoded the collections following applicable industry standards for the transcription, encoding and rendering of primary source textual documents:
Transcription - All transcriptions were originally keyed in transcriber-selected word processing file formats. A number of editorial conventions were applied to the text in order to automate scheduled base-line text encoding. For example, square brackets [ ? ] and the question mark were used to indicate unclear text. A proofreader replaced them if and once text was interpreted. [gap] was used when text was entirely illegible. Additions were noted using one of two conventions. Where added text was inline [add: N. York "inline"] was used. [add: N. York "sup"] was used to indicate when super linear text was added.
Text Encoding - All texts were encoded in the eXtensible Markup Language (XML) and in accordance with the Text Encoding Initiative (TEI) Document Type Definitions (DTD). TEI is an internationally developed data standard currently expressed in XML for creating, interchanging, and representing simple and complex electronic texts.
For this project, base line level encoding was automated using the PERL programming language. Scripts were written to match and map textual structures and editorial conventions in the transcription text to TEI textual and structural elements. Base line encoding included the TEI Header, main letter-based structural elements, additions, deletions, gaps, lineation, paragraphs, and superscripted text. All texts were validated before and after project encoders received files.
Using XMLSpy software, project encoders proofed and edited baseline-encoding, encoded person and place names, and encoded missed structural and textual elements. A copy of the project's encoding manual is available online.
Text maunipulation and delivery - Texts are delivered to the WWW using the XSL transformation language, an associated XML data standard. The default reading of the texts is the diplomatic version but because the texts were normalized silently they can be read in a normalized or regularized mode. Readers can also select to read the text with or without transcription and biographical notes as well as with or without lineation maintained.
Document Storage - Document storage is provided by a MySQL database, which is loaded from XML files parsed with a Perl XML SAX parser. Database fields are populated with information parsed from XML files. At the document (text) level, information is captured about the document's creation date, spatial and temporal coverage, and source description.
Documents are stored as one or more XML objects (e.g., a page, paragraph). Each XML object is associated with one parent document. An XML object stores information about its source file and the XML encoded portion of the parent document represented by the object. Each object also stores several text fields derived from its source XML (e.g., TEI header, diplomatic full text,) to assist document searching (see below).
Each XML object references one or more images of its parent document. An image may be captured in one or more image formats (e.g., JPEG, TIFF). Each image is associated with its referencing XML object, its storage format(s), and the name and media type of its storage location.
Document Updates - Batch updates to the database may be made by re-loading updated source XML files. An administrative web interface, written in PHP, interacts with the MySQL database to permit a project administrator to modify individual documents. The administrative interface also provides an export utility, which recreates source XML files in TEI or Dublin Core format.
Document Retrieval - Documents may be located by browsing indices organized by document title, date, author and recipient. To locate a specific document, a search index is generated from XML files that locates and scores individual words in each document, in one or more search classes (e.g., TEI header, full text, person names, place names). The search index is then used to retrieve documents matching search text in the selected search class.
Instructional Design
The ACVA Learning Resources website was designed with two main goals:
- To provide a grades 9-12 population with an easy introduction to the large amount of material in the Atlantic Canada Virtual Archives.
- To provide teachers with ways to use the Atlantic Canada Virtual Archives in the classroom.
All web pages in the ACVA Learning Resources website are valid XHTML. All formatting is performed with CSS1 and CSS2. The ACVA Learning Resources website has been designed to loose none of its essential functionality for older non-CSS Web browsers.
Client-side scripting is accomplished with JavaScript. The ACVA Learning Resources website has been designed to loose none of its essential functionality for non-JavaScript Web browsers. Server-side scripting is accomplished with PHP.
All text in the ACVA Learning Resources website was created in ASCII format with the BBEdit text editor then marked up in XHTML for delivery to the Web, with the following two exceptions:
- The narrative content in the "Snoop" section was marked up in XML then delivered using two different methods: XSLT and Macromedia Flash.
- The lesson plan, student handout and assessment criteria in the "Teachers" section were reproduced and adapted from the National Library of Canada website according to the terms outlined in the "Educational materials" section of the National Library of Canada's Important Notices page.
All interactive pieces in the ACVA Learning Resources website are in Macromedia Flash 6 format. The "Snoop" Flash movie is linked to an XHTML equivalent.
Web design
The ACVA website conforms to current W3C Web standards: Web pages are valid XHTML 1.0 Transitional and also validate against specifications for CSS 1 and 2. Design has been effected in accordance with W3C Content Accessibility Guidelines.
You are here: ACVA » Edward Winslow Letters » About » About the Project
