Fast and slow digitization

You rarely hear of a library or archive without a backlog of items that needs to be cataloged or digitized. With limited funding, staff, time, and increasing acquisitions, libraries have had to either ramp up the speed and efficiency of their digitization efforts or be much choosier about what gets digitized. This dilemma is the focus of literature like Erway and Schaffner’s (2007) “Shifting Gears: Gearing Up to Get into the Flow,” which argues that users are better served by a higher quantity of digitized collections than a high quality of fewer items (2). They acknowledge that this principle may not apply in every situation (3). The speed and level of detail in digitization can and should vary depending on the value of the items, the volume of documents, and most importantly, the way researchers will work with the digitized item.

The level of detail depends on the kind of access that will be demanded. For items high in demand, librarians can create digital editions that are like facsimiles, accompanied with expert knowledge and interpretation and tools like transcription. A slower digitization process is important for items like valuable and in-demand manuscripts that are difficult to read without transcription.

Faithfully replicating manuscripts requires some text encoding skills – specifically, XML markup encoded according to TEI guidelines. The “NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials” (2002-2003) describes how basic TEI encoding “can be applied nearly automatically using scripts, but detailed encoding requires additional staff, training, and time” (86-7 ).

I want to walk through some examples of TEI projects to show how they differ from  basic digitization.

#1 : Electronic Beowulf

Electronic Beowulf

This online edition of Beowulf provides full translations and helps with definitions and grammar. The website brings together multiple restorations and editions of the text, allowing scholars to make comparisons between all of them. It is the work of editor Kevin Kiernan, a Beowulf scholar, and software engineer Emil Iacob.

#2 : Emily Dickinson Archive

Emily Dickinson Archie

Harvard and Amherst, which both own parts of Dickinson’s archive, collaborated to make this website. Transcriptions on this site show how Dickinson’s editors made changes to her work, like standardizing the arrangement of words on the pages. This site lets users contribute to transcriptions by typing or uploading TEI-encoded documents. I love seeing how libraries use crowdsourcing to engage users – for example, check out the British Library’s crowdsourcing projects.

#3 : Jane Austen’s Fiction Manuscripts


This project is the work of English scholar Kathryn Sutherland and digital humanities scholar Elena Pierazzo.  The manuscripts come from the Bodleian Library in Oxford, the British Library in London, the Pierpont Morgan Library in New York, King’s College in Cambridge, and private ownership. This image gives a good example of diplomatic transcriptions; you can see how the transcription reflects Austen’s edits. Sutherland and Pierazzo actually wrote a whole article about their experience making this project, entitled “The Author’s Hand: From Page to Screen.”

These projects almost always seem to be collaborative, dividing up the work involved and frequently bringing together related items from around the world. Some projects are the work of a library or archive, but many are completed by scholars, with libraries just providing copies of the items. To me, that raises the question: should libraries and archives be concerned about making elaborate digital editions of certain manuscripts, or should that work be left to scholars who might know more about the text? There is a lot of opportunity for collaboration, and librarians have a lot of technical support to offer.

Speed and efficiency are important for ensuring access to more of a institution’s collections. For the certain items that are popular and demand more access, this kind of attention to detail can produce wonderful research tools. I also am intrigued at the potential of crowdsourcing. I wonder if libraries could let users generate transcriptions on a wider collection of items. For A/V materials, users could generate captions. The ultimate question is, should libraries and archives involve themselves in more projects like these, or are they too time-consuming? I think, realistically, that most institutions don’t have the resources to commit to these big projects, but could reach out to and collaborate with scholars who undertake such projects. My main point is that libraries can’t always take a cookie-cutter approach to digitization. Some items need just basic detail, while some deserve more description to ensure access.


Thanks for reading!



Erway, R. and Schaffner, J. (2007). “Shifting Gears: Gearing Up to Get Into the Flow. OCLC Programs and Research. Retrieved from

The Humanities Advanced Technology and Information Institute, University of Glasgow and the National Initiative for a Networked Cultural Heritage (2002-2003). “The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials.” National Initiative for a Networked Cultural Heritage. Retrieved from

Sutherland, Kathryn and Pierazzo, Elena (2012). “The Author’s Hand: From Page to Screen.” Collaborative Research in the Digital Humanities. Marilyn Deegan and Willard Mccarty, eds. Farnham, Surrey, GBR: Ashgate Publishing Group, 2012. 191-212.