Fast and slow digitization

You rarely hear of a library or archive without a backlog of items that needs to be cataloged or digitized. With limited funding, staff, time, and increasing acquisitions, libraries have had to either ramp up the speed and efficiency of their digitization efforts or be much choosier about what gets digitized. This dilemma is the focus of literature like Erway and Schaffner’s (2007) “Shifting Gears: Gearing Up to Get into the Flow,” which argues that users are better served by a higher quantity of digitized collections than a high quality of fewer items (2). They acknowledge that this principle may not apply in every situation (3). The speed and level of detail in digitization can and should vary depending on the value of the items, the volume of documents, and most importantly, the way researchers will work with the digitized item.

The level of detail depends on the kind of access that will be demanded. For items high in demand, librarians can create digital editions that are like facsimiles, accompanied with expert knowledge and interpretation and tools like transcription. A slower digitization process is important for items like valuable and in-demand manuscripts that are difficult to read without transcription.

Faithfully replicating manuscripts requires some text encoding skills – specifically, XML markup encoded according to TEI guidelines. The “NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials” (2002-2003) describes how basic TEI encoding “can be applied nearly automatically using scripts, but detailed encoding requires additional staff, training, and time” (86-7 ).

I want to walk through some examples of TEI projects to show how they differ from  basic digitization.

#1 : Electronic Beowulf

Electronic Beowulf

This online edition of Beowulf provides full translations and helps with definitions and grammar. The website brings together multiple restorations and editions of the text, allowing scholars to make comparisons between all of them. It is the work of editor Kevin Kiernan, a Beowulf scholar, and software engineer Emil Iacob.

#2 : Emily Dickinson Archive

Emily Dickinson Archie

Harvard and Amherst, which both own parts of Dickinson’s archive, collaborated to make this website. Transcriptions on this site show how Dickinson’s editors made changes to her work, like standardizing the arrangement of words on the pages. This site lets users contribute to transcriptions by typing or uploading TEI-encoded documents. I love seeing how libraries use crowdsourcing to engage users – for example, check out the British Library’s crowdsourcing projects.

#3 : Jane Austen’s Fiction Manuscripts


This project is the work of English scholar Kathryn Sutherland and digital humanities scholar Elena Pierazzo.  The manuscripts come from the Bodleian Library in Oxford, the British Library in London, the Pierpont Morgan Library in New York, King’s College in Cambridge, and private ownership. This image gives a good example of diplomatic transcriptions; you can see how the transcription reflects Austen’s edits. Sutherland and Pierazzo actually wrote a whole article about their experience making this project, entitled “The Author’s Hand: From Page to Screen.”

These projects almost always seem to be collaborative, dividing up the work involved and frequently bringing together related items from around the world. Some projects are the work of a library or archive, but many are completed by scholars, with libraries just providing copies of the items. To me, that raises the question: should libraries and archives be concerned about making elaborate digital editions of certain manuscripts, or should that work be left to scholars who might know more about the text? There is a lot of opportunity for collaboration, and librarians have a lot of technical support to offer.

Speed and efficiency are important for ensuring access to more of a institution’s collections. For the certain items that are popular and demand more access, this kind of attention to detail can produce wonderful research tools. I also am intrigued at the potential of crowdsourcing. I wonder if libraries could let users generate transcriptions on a wider collection of items. For A/V materials, users could generate captions. The ultimate question is, should libraries and archives involve themselves in more projects like these, or are they too time-consuming? I think, realistically, that most institutions don’t have the resources to commit to these big projects, but could reach out to and collaborate with scholars who undertake such projects. My main point is that libraries can’t always take a cookie-cutter approach to digitization. Some items need just basic detail, while some deserve more description to ensure access.


Thanks for reading!



Erway, R. and Schaffner, J. (2007). “Shifting Gears: Gearing Up to Get Into the Flow. OCLC Programs and Research. Retrieved from

The Humanities Advanced Technology and Information Institute, University of Glasgow and the National Initiative for a Networked Cultural Heritage (2002-2003). “The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials.” National Initiative for a Networked Cultural Heritage. Retrieved from

Sutherland, Kathryn and Pierazzo, Elena (2012). “The Author’s Hand: From Page to Screen.” Collaborative Research in the Digital Humanities. Marilyn Deegan and Willard Mccarty, eds. Farnham, Surrey, GBR: Ashgate Publishing Group, 2012. 191-212.


3 thoughts on “Fast and slow digitization

  1. I was really struck by your comment that attention to detail in digital archives can be a good research resource and it made me remember this project. This was a many years in the making digital collection of materials regarding the United States civil war/wartime. For many years I was aware that this project was being worked on but, didn’t really think much of it given that United States history, specifically the wartime, and neither of those things do I spend a lot of time studying. But, if it were information a person needed this would be an incredible resource. It has images of documents, clothing, tools, and more which have been found at the Wilson Creek, near Springfield, MO. In the not to distance past these types of materials could have only been viewed by traveling to a museum which was specific to this particular battlefield but now, one only needs an internet connection and a little bit of time with google to find this type of information.

    Also, in regards to your question on whether or not libraries should involve themselves in things like this, I think they definitely should. But, as you noted some libraries are too small to be able to work on projects like this by themselves. The project I linked to actually shows at the bottom that it was a joint effort between three different groups, so finding partners for these types of projects probably isn’t too far fetched. Additionally, with the way we are connected to each other via the internet, libraries aren’t even limited to help that is geographically close to them. In the case of the aforementioned project the groups working on it are geographically close, but if needed the library could have reached out to national groups for war history, preservation, or any number of categories that this project falls into. So, basically, I think libraries should do whatever they can to continue creating digital libraries alongside physical libraries and in the case of smaller libraries, I think there are plenty of groups would want to help with such a process.


  2. Collaboration is great for expediting a seemingly insurmountable workload, and I envy collections that can do that. In our case the collection here is too specific and too unique to collaborate on. To this end we’ve had to make sacrifices as you mentioned on how much time attention and detail we can apply to certain items, often only focusing on a rudimentary collection of metadata, some digital processing and then implementing the material into our exhibition. Speaking of exhibitions a lot can depend as well on the aim of collections and institutions, in our case we don’t really have an inherit focus on indexing and cataloging but rather with showcasing our collection online.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s