A Model for Digital Preservation

     Digital preservation is a complex process that can be carried out in a variety of ways. While each institution should develop its own policies and procedures for exactly how to carry out digital preservation projects, Heather Moulaison Sandy and Edward M. Corrado (2018) have proposed using a three part model. The three prongs of this model are (1) management, (2) technology, and (3) content.

     Moulaison Sandy and Corrado note that management comes first in the triad due to its necessity within the model. Without the proper management to guide them, any content-related or technological factors will not remain functional over time. Management decisions will most often be used to address “1) workflow and procedural obstacles, 2) resource limitations, and 3) lack of buy-in” (Moulaison Sandy and Corrado 2018, 5). One way to address these issues is through proper documentation and well planned policies about your institutions digital preservation efforts. Another important piece is using human resources to meet the technological, description (through metadata), and content (via appraisal and selection of items for the collection) needs of digital preservation projects.

     Technology makes up the second part of this model. This part can often be the most intimidating to those practitioners who are new to the digital aspect of digital preservation. One issue that can arise when it comes to the technological side of digital preservation is what type and format to use when preserving items digitally. This becomes important as time passes because digital items must be updated so that they can be accessed by whatever computer programs are in use at the time. Additionally, many (if not most) file types are somewhat unstable and can decay over time, just as physical materials can. If a file type is no longer accessible via available programs, an emulator may be needed. An emulator is a special program that allows older file types and formats to be viewed even if the traditional programs used to open such files are no longer available.

     The final part of Moulaison Sandy and Corrado’s model is content. This part is what the authors view as “the linchpen” as the believe that content is what “motivates the digital preservation project’s documentation and personnel, drives outreach, is the basis of securing funding, and influences the technology on a variety of levels, from file format to digital preservation repository” (Moulaison Sandy and Corrado 2018, 8-9). Appraisal and selection policies are key in determining exactly which content should be included in digital preservation. These decisions will vary based on many factors and the policies must be flexible enough to encompass many types of content. Additionally, the selected content then feeds into the technology part of the model as the best format and type of digital preservation must then be decided. All of these decisions are guided by the policies and procedures put in place by the management portion of the model.

     The three branch model proposed by Moulaison Sandy and Corrado seems to touch on the three biggest factors of digital preservation. If you take away any one part of the model, then digital preservation becomes much more difficult (if not totally impossible) to successfully complete. This model appears to be in line with much the information covered in this course. I believe that a tri-partite model that focuses on management, technology, and content is structured in a way that promotes successful and efficient digital preservation while still allowing for the varied needs of different institutions.

Moulaison Sandy, Heather and Edward M. Corrado. “Bringing Content into the Picture: Proposing a Tri-Partite Model for Digital Preservation.” Journal of Library Administration 58, no. 1 (2018): 1-17. Retrieved from https://doi.org/10.1080/01930826.2017.1385988

 

 

Advertisements

Oh, What a Tangled Web We Weave: Linked Data & Digital Libraries

By Melissa Anthony

     As explained in the YouTube video we watched in Week 6, linked data allows for any concept, thing, or entity to have a unique identity on the Web (expressed through a Uniform Resource Identifier or URI). This URI communicates information about the concept, thing, or entity, including its relationships to other concepts, things, or entities (which also each have their own URIs). The relationship function is what changes “data” into “linked data” as the data about a concept, thing, or entity becomes linked to data about other concepts, things, or entities based on the relationship(s) between them. To see a great example of linked data in action, check out this interactive Marvel Cinematic Universe (MCU) Infographic, which was discovered and shared on Facebook by LS 562’s own Amy An. The infographic at large is an expansive web of points (which are images representing characters) and colored lines that show the relationship between the characters that they connect, with the color of the line signifying the type of relationship the characters have (family, work, enemies, etc.). If you take a closer look at Steve Roger’s profile (because he is the original AND the best Avenger!), americayou can see that it has an image linking it to the profile for Captain America (Roger’s superhero alter-ego) followed a small biographical section and then a collection of all profiles that are linked with Steve Rogers in some way (friend, romance, enemy, work, and movies). This allows users to find related characters or determine in which films Steve Rogers/Captain America has made an appearance. Hopefully, showcasing this infographic serves as a concrete example of what linked data is and how it can be used to communicate information about related concepts.

     While that website is a really cool example of linked data, I was curious about how linked data might be used in digital libraries, so I found two such examples. The first is the Canadian Linked Data Initiative (CLDI), which formed in 2015 as “a collaboration between five of Canada’s largest research libraries, Library and Archives Canada, Bibliothèque et Archives nationales du Quèbec, and Canadiana.org” (Van Ballegooie, Borie, and Senior 2017, 207). The CLDI seeks to lead Canadian libraries in the shift from MARC formatting to the BIBFRAME model, developed by the Library of Congress as a way to incorporate linked data into library cataloging. While this initiative covers many different departments and goals within libraries, there is one sub group of the CLDI focused solely on digital resources. The Digital Projects Working Group (DPWG) has the goal of implementing linked data for digital collections. One specific project they are working on is creating a collection of digital and digitized resources related to Canada’s history in celebration of the 150th anniversary of Confederation, which occurred in 2017. The DPWG plans on using linked data to build a user interface that will provide visual representations of the relationships between the digital items within the collection (Van Ballegooie, Borie, and Senior 2017, 207-210). This project, in particular, echoes the MCU guide in its visualization of relationships among data.

     Wenige and Ruhland (2018) investigate how digital libraries can use Linked Open Data (LOD), linked data that utilizes open sources, to enhance information retrieval and provide users with better recommendations of resources relevant to their interests. Their study examined “how bibliographic datasets from LOD endpoints can be utilized for recommender systems in digital libraries” (267). While their study had a limited scope, it did indicate that implementing LOD in such systems may indeed benefit users by providing recommendations when searching digital collections, particularly when full-text data is not available to provide text-based recommendations via full-text searches. The researchers also suggest two approaches that may aid users in finding relevant resources: flexible similarity detection and constraint-based recommendations. Flexible similarity detection assists in browsing and exploring digital collections by allowing users to refine results using “different levels of inter-concept similarities until results best fit their needs” (Wenige and Ruhland 2018, 267). Constraint-based recommendations could replace advanced search options by combining user profile data and searching histories with similarity calculations and graph pattern matching (both of which use LOD) to produce the most relevant resources when a user performs a basic search (Wenige and Ruhland 2018, 267).

     These two examples show only a small amount of what linked data can do. As more libraries begin adopting both linked data and LOD, it is to be expected that staff members will continue to innovate new uses for linked data. As with many things, the ways that linked data can be used in libraries, digital and otherwise, seem to be limited only by our imagination. It is certainly an exciting time for those of us who love data, whether we are collecting, organizing, storing, retrieving, or using it!

References

Ker, Billy, Chee Wei Xian, and Denise Chong. “A Who’s Who Guide to the Marvel Cinematic Universe.” Published April 24, 2018. str.sg/MarvelWhosWho

OCLCVideo. Linked Data for Libraries. Video, 00:14:13. Published August 9, 2012. https://www.youtube.com/watch?v=fWfEYcnk8Z8

Ontotext. “What Are Linked Data and Linked Open Data?” Accessed on October 27, 2018. https://ontotext.com/knowledgehub/fundamentals/linked-data-linked-open-data/

Van Ballegooie, Marlene, Juliya Borie, and Andrew Senior. “The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future.” The Serials Librarian 72, no. 1-4 (2017): 207-213. Accessed October 27, 2018. https://doi.org/10.1080/0361526X.2017.1292751

Wenige, Lisa, and Johannes Ruhland. “Retrieval by Recommendation: Using LOD Technologies to Improve Digital Library Search.” International Journal on Digital Libraries 19, no. 2-3 (September 2018): 253-269. Accessed October 27, 2018. https://doi-org.libdata.lib.ua.edu/10.1007/s00799-17-0224-8

What makes a search “diligent”?

According to the Wilkin (2011), it seems as though a major percentage of materials could be considered “orphaned,” in other words still under copyright but with an unidentifiable or unlocated copyright holder. This could certainly open up the amount of works that could be included in digital libraries. I am concerned, however, with exactly how one is to determine if a work is, indeed, orphaned. Hirtle (2012) mentions conducting a “diligent search” in the hopes of “clarifying copyright status,” in other words to determine if a work is still under copyright and, if it is, to either find the copyright holder or declare the work as orphaned. Hirtle (2012) shares two instances of researchers trying to track down any rights holders to particular collection. In one case, they spent about 4 hours per book in the hopes of identifying any copyright holders. In the other case, dealing with 8434 items, it took 14 weeks of full-time work to find a handful of rights holders. These cases seem to illustrate that there are no standard guidelines for what constitutes a “diligent search.” Hoping to gain a better understanding of what length and depth of searching is expected to legally protect an institution from misidentifying items as orphaned, I went straight to the horse’s mouth (as it were): the U.S. Copyright Office.

The U.S. Copyright Office (n.d.) has a page dedicated to orphaned works, which includes a short discussion of what orphaned works mean to copyright policies. The U.S. Copyright Office (n.d.) seems to have a low opinion of orphaned works, even going so far as to declare that “the uncertainty surrounding the ownership status of orphan works does not serve the objectives of the copyright system” (U.S. Copyright Office, n.d.). They further disparage orphaned works by asserting that “[f]or good faith users, orphan works are a frustration, a liability risk, and a major cause of gridlock in the digital marketplace” (U.S. Copyright Office, n.d.). These statements, however, did not address the idea of a diligent search at all, so I dug a bit deeper by examining a report that was included in the web page.

The document Orphan Works and Mass Digitization: A Report of the Register of Copyrights was published in June of 2015 and is a lengthy discussion and evaluation of the state of copyright law, policy, and procedures, particularly as related to recent mass digitization efforts, such as Google Books and HathiTrust. While the report is extensive, upon examining the table of contents I found a section specifically addressing what it terms “Good Faith Diligent Search.” This section defines a “diligent” search as one in which “users search or utilize: (1) Copyright Office online records; (2) reasonably available sources of copyright authorship and ownership information, including licensor information where appropriate; (3) technology tools and, where reasonable, expert assistance (such as a professional researcher or attorney); and (4) appropriate databases, including online databases” (U.S. Copyright Office, 2015, p. 57). There is further talk about taking any other “reasonable” steps and using any other database or information resource that seems “reasonable” (U.S. Copyright Office, 2015, p. 57-58). Thus, it seems that what constitutes a diligent search has a certain level of rigor, as outline by the four point above, yet still varies on a case-by-case basis.

Given the prevalence of orphaned works, as evidenced by Wilkin (2011), digital libraries must be prepared to deal with items of orphaned status, regardless of the frustration and risk associated with them. While there is no one-size-fits-all method of conducting a diligent search, it seem as though best practice would be to follow the bare minimum guidelines outlined above while follow up on any other reasonable leads, should any arise. In the end, like so much in LIS, orphaned works and diligent searches are subjective and we as LIS professional must do the best we can with each situation we are given.

References

Hirtle, P. (2012). Learning to live with risk. Art Libraries Journal, 32(2). http://ecommons.cornell.edu/bitstream/1813/24519/2/ARLIS%20UK%20final.pdf

U.S. Copyright Office. (n.d.). Orphan works. Retrieved from https://www.copyright.gov/orphan/

U.S. Copyright Office. (2015, June). Orphan works and mass digitization: A report of the register of copyrights. Retrieved from https://www.copyright.gov/orphan/reports/orphan-works2015.pdf 

Wilkin, J. P. (2011). Bibliographic indeterminacy and the scale of problems and opportunities of “rights” in digital collection building. Retrieved from https://www.clir.org/pubs/ruminations/wilkin/