Writing in 2004, Brian Lavoie and Lorcan Dempsey of OCLC Research explained that digital information environments were only beginning to consider preservation an essential activity of digital stewardship (para. 3-5). Early digital preservation efforts were motivated in large part by anxiety about the potential for catastrophic loss of resources (para. 6). And fair enough – the fragility of digital storage media and high degree of technology dependence does make deferring preservation decisions for digital materials riskier than it is for analog items (para. 9). Today, digital librarians and archivists recognize that digital preservation is best approached as a process and is most effective when undertaken preemptively (para. 10). In short, contemporary approaches to the long-term preservation of digital resources entails ongoing “digital asset management practices” at all stages of the information life cycle (para. 6-7).
Multiple guidelines and standards help us know how to best manage specific digital resources for preservation. The National Digital Stewardship Alliance (NDSA) Levels of Digital Preservation, for example, is a tiered set of guidelines for preserving content at four progressive levels (protect, know, monitor, and repair data) across five functional areas: storage and geographic location, file fixity and data integrity, information security, metadata, and file formats (Phillips, Bailey, Goethals, & Owens, 2003, p. 1-3). In this model, preservation guidelines become more strict as records (and/or institutional priorities) move from the first to fourth level. For instance, going from protect to repair in the storage and location area requires keeping multiple copies of records in different places to reduce the risk of loss from bit rot, system failure, and local/regional disasters (p. 4). Similarly, requirements for verifying file fixity and data integrity increase up the levels from a check at initial ingest (protect) to repeated, ongoing content checking (repair) (p. 4).
Guidelines like those provided by NDSA are part of responsible digital stewardship, which is a documented process that preserves materials, access to them, and intellectual property rights (Lavoie & Dempsey, 2004, para. 44-46). Standards for creating high-quality preservation metadata also help to facilitate these outcomes (Phillips, Bailey, Goethels, and Owens, 2003, p. 3). Preservation metadata includes descriptive, rights, structural, and technical properties and conveys information about copyright status, licenses, and institutional use restrictions for materials in custody (Cornell University Library, 2004, Metadata). For example, information about medium, storage, size, encoding, and location allows files to be rendered and helps ensure that items remain available for use (Cornell University Library, 2004, File Formats and Hardware, para. 4).
Furthermore, preservation metadata supports the evidentiary value of digital records by documenting events and actions. Digital preservation entails converting files from one format or storage medium to another (e.g. refreshing or migrating), so digital archivists need to be able to verify that information isn’t lost and files aren’t altered during conversion (Phillips, Bailey, Goethals, & Owens, 2003, p. 4; Cornell University Library, 2004, Digital Preservation Strategies, para. 2). Using a technique called canonicalization, they can determine whether or not the essential characteristics of a document remain intact and record the results in the metadata for that resource (Cornell University Library, 2004, Digital Preservation Strategies, para. 12, para. 15).
Digital archivists can create high-quality, interoperable preservation metadata for digital resources using the PREMIS 3.0 data model, which is informed by the Open Archival Information System (OAIS) reference model (PREMIS Editorial Committee, 2015, p. 2). PREMIS defines “semantic units” and maps them to a variety of object, agent, events, and rights entities (p. 8). Semantic units are used to uniquely identify instances of entities, to express their associated properties, and to represent relationships between Objects and between different types of entities (p. 17-21). For example, the information needed to verify fixity – as described above – is indicated by a set of semantic components under the semantic unit objectCharacteristics and running a fixity check program on an object to detect unauthorized changes is an event. PREMIS allows for fixity to be tested against stored message digest information and the testing itself to be recorded. Similarly, provenance can be documented by defining semantic units associated with Events and linking Object entities and Event entities (p. 258). This metadata, along with digital signatures, can be stored and accessed later to support the trustworthiness of the digital records in custody.
Finally, the PREMIS data model can be expressed using OWL (or Web Ontology Language) – a highly-interoperable machine-readable semantic language. In fact, the most recent version of PREMIS OWL ontology (3.0 released in 2018) incorporates “emerging Linked Data best practices and connections to other relevant RDF ontologies” – like Provenance ontology (PROV-O), DC terms, and LOC preservation vocabularies – to make information about digital resource preservation available to off-site users as linked open data (LOD).
Bibliography
Cornell University Library. (2004). Digital preservation management: Implementing short-term strategies for long-term solutions [Tutorial]. Retrieved from https://web.archive.org/web/20180626145944/http://www.dpworkshop.org/dpm-eng/eng_index.html
Lavoie, B., & Dempsey, L. (2004, July/August). Thirteen ways of looking at…digital preservation. D-Lib Magazine, 10, 7-8. Retrieved from http://www.dlib.org/dlib/july04/lavoie/07lavoie.html
Library of Congress. (2018, October 12). PREMIS 3 ontology. Retrieved from
http://id.loc.gov/ontologies/premis.html
Library of Congress. (2018, December 20). PREMIS ontology for PREMIS data dictionary 3. Retrieved from
https://www.loc.gov/standards/premis/ontology/owl-version3.html
McGuinness, D.L., & van Harmelen, F. (2004, February). OWL Web Ontology Language Overview. W3C Recommendation.
https://www.w3.org/TR/owl-features/
Phillips, M., Bailey, J., Goethals, A., & Owens, T. (2003). The NDSA levels of digital preservation: An explanation and uses. Retrieved from https://ndsa.org/documents/NDSA_Levels_Archiving_2013.pdf
PREMIS Editorial Committee. (2015, November). PREMIS data dictionary for preservation metadata. Retrieved from https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf