Fandom, Copyright, and Digital Archives

When I was first ventured into the mid-2000s fledgling world of social media, one of the forces that shaped my experience – and that has come up in my LIS classes many, many more times than I expected it to – was online fandom culture. I was fourteen when I started my LiveJournal, and I probably read more fanfiction than original work during my high school years. I’m … not going to comment on whether or not that trend has continued into my adult life.

What definitely has continued into my adult life is an interest in the phenomenon of fandom: its evolution through history, its implications as a broader social model, and the ethics of fanwork produced and shared without the explicit blessing of the creators of the source material from which those works derive. That third thing is the one that tends to come as part of my LIS studies, and it’s the first thing I think of whenever anyone mentions copyright and intellectual property. Dedicated fanwork hosting websites like Archive of our Own are digital libraries in their own right, sourcing their content from thousands of creators, maintaining searchable databases through custom metadata schemas, and doing their best to protect the legal rights of their contributors in the rocky terrain of derivative work.

One of the most interesting pieces of scholarship I’ve come across about fandom is Brittany Johnson’s piece about fanfiction and copyright law titled Live Long and Prosper. In addition to its excellent title, the article comes from a legal scholar who can speak with authority on the implications of fanwork. What used to be a purely nonprofit enterprise has become more complicated with the phenomenon of ‘pulling to publish,’ when works like Fifty Shades of Grey are plucked from their fandom roots by publishers and turned into works of ‘original’ fiction. This upsets the plausible deniability that has protected fandom in the past – that fanwork was fair use because it was not and would never be a source of profit, much less a source of profit that would pull income from the original title. Fair use comes after clearing four hurdles – the intent of the new work, the nature of the copyrighted work, the amount of the copyrighted work used in the new work, and the impact of the new work on the potential market of the copyrighted work. Some fanworks borrow more heavily from source material than others, but the precedent of pulling to publish places even the most derivative of derivative works in a tenuous position if the copyright holder of the source material decides they’re not happy with something they perceive as a threat to their own revenue.

The fandom conversation came to mind as I read Peter Hirtle’s Learning to Live with Risk and its discussion of copyright in libraries and archives. Hirtle discusses institutions who are “unintentionally violat[ing] copyright law with the best intentions” and who distribute intellectual property in ways that may not be protected under the banner of fair use. As applied to fandom, while the hosting archive has full permission from the creator of the fanwork, they might get trouble from the copyright holder of the source material from which the fanwork derives, which puts them in a similar position to the situations Hirtle discusses. Fandom spheres are the ones in which I have the most personal experience, but nontraditional archives, particularly digital nontraditional archives, will likely have a slew of similar problems. If a library holds digitizations of an artist’s collage pieces, for example, are they beholden to the original copyright holders of the assets used in each collage? It’s difficult to anticipate when copyright holders will decide to crack down; some unexpected creators have swept through fandom spaces to purge fanwork of their intellectual property for a number of dubious reasons.

Johnson proposes an amendment to existing copyright law that would protect derivative works, as long as the hosts of those works (digital fandom libraries) had contributors meet non-commercialization and attribution standards agreed upon by the hosts and the copyright holders of the source material. It would be a huge shift in fandom culture, perhaps even moreso than the shift already happening around pulling to publish; fandom has always been a clandestine thing, a space primarily occupied by unpaid creative women sharing their work amongst themselves, but as it becomes a more mainstream phenomenon, the way that creators (particularly female creators) occupy those spaces is changing. An amendment to copyright law that would protect both creators and the hosts of their work might be necessary as fandom moves into its new digital era. No matter what happens, however, the owners and curators of digital fandom archives will be in a unique position as the bridge between fandom creators and copyright holders.

Hirtle, P. (2012). Learning to live with risk. Art Libraries Journal, 37(2), 1-15. http://ecommons.cornell.edu/bitstream/1813/24519/2/ARLIS%20UK%20final.pdf.

Johnson, B. (2016). Live long and prosper: How the persistent and increasing popularity of fan fiction requires a new solution in copyright law. Minnesota Law Review, (4), 1645. http://www.minnesotalawreview.org/wp-content/uploads/2016/04/Johnson_ONLINEPDF.pdf.

Advertisements

Crowdsourcing Transcriptions in Digital Libraries

Scanning images is only half the battle. With all the discussion of quantity over quality, I thought it might be useful to delve into the task of transforming digitized images into useful sources of information for the typical researcher. In the case of images with accompanying text, this often requires the costly and time-consuming task of transcription. The question, then, is how do institutions with limited resources and staffing accommodate both digitizing and transcribing images? Many institutions have come to the same conclusion and achieved surprising results through volunteer crowdsourcing.

The first question that most institutions ask themselves when considering crowdsourcing projects is “will the public want to do this?” As the Getty Research Institute wrote in a popular blog post on crowdsourcing in the digital humanities, “While there are plenty of examples of successful crowdsourcing projects…we simply weren’t sure people would want to do this kind of work on our collections.” Yet, the results overwhelmingly show that when a project is thoughtfully crafted, the public is willing to help.

One recent example of this is a project at the Alabama Department of Archives and History (ADAH) that I was lucky to be involved with during an internship. In celebration of the centennial of World War I, ADAH scanned 111,000 index cards that contained service records of Alabamian soldiers and civilian employees of military bases within the state. A potential treasure-trove for historians and genealogist, the cards contain a wealth of personal information, including name, race, age, date and place of birth, home address, date and location of induction, units served in, rank, engagements, wounds, dates and locations of service, and the date of discharge of individuals. Despite the information, however, ADAH recorded only minimal metadata during the scanning process.

Instead, ADAH chose to engage the public with a crowdsourcing project meant to bring local volunteers face-to-face with their state’s World War I history. ADAH began the project on April 7, 2018, with an orientation session for twelve volunteers. The session worked out any potential issues and led to a streamlined system that included a volunteer user’s guide with detailed instructions and field descriptions for the standard text boxes used to capture the transcriptions. The result was almost instant—within twelve days, 11,000 cards were transcribed. The full collection was finished in only three and a half months, half the time ADAH optimistically estimated the project would last. In all, 82 active volunteers donated 3,521 hours of work. The transcriptions are now being migrated to ADAH’s digital collections.

There are several crowdsourcing platforms available for digital library projects, and some institutions opt to create their own. Perhaps the most popular platform is Zooniverse, which was developed at the University of Oxford in 2007. Zooniverse has the benefit of a built-in volunteer community and allows users to work collaboratively, with multiple users transcribing the same document. ADAH, however, chose to work with the open-source platform FromThePage by Brumfield Labs and entered an agreement with the company to fund the development of structured data fields and enhanced compatibility with CONTENTdm.

Furthermore, ADAH chose to forgo collaborative transcriptions by closing the (typically open-access) platform to only registered volunteers, who requested a login from the project manager, and instructing volunteers to work individually on records. This allowed ADAH to maintain control over the project while comfortably ramping up the number and activity of volunteers as the institution became comfortable with the progress and direction of the project.

The project was ADAH’s first foray into “academic crowdsourcing,” which Victoria Van Hyning at Zooniverse describes as members of the public working with specialists to conduct research. ADAH is now planning on future projects, which may include World War II and Korean War service records, licensure records and legislative directories.

References:

Alabama Department of Archives and History (2018). Case Study: Crowdsourcing the Alabama WWI Service Records. Retrieved from https://www.statearchivists.org/files/3515/3487/7545/ExchangeDay_CaseStudy_AlabamaCrowdsourcing.pdf

Deines, N., Gill, M., Lincoln, M., & Clifford, M. (2018, February 7). Six Lessons Learned from Our First Crowdsourcing Project in the Digital Humanities [web log comment]. Retrieved from http://blogs.getty.edu/iris/six-lessons-learned-from-our-first-crowdsourcing-project-in-the-digital-humanities/

Van Hyning, Victoria & Blickhan, Samantha & Trouille, Laura & Lintott, Chris. (2017). Transforming Libraries and Archives through Crowdsourcing. D-Lib Magazine, 23. Retrieved from http://www.dlib.org/dlib/may17/vanhyning/05vanhyning.html

 

 

Fast and slow digitization

You rarely hear of a library or archive without a backlog of items that needs to be cataloged or digitized. With limited funding, staff, time, and increasing acquisitions, libraries have had to either ramp up the speed and efficiency of their digitization efforts or be much choosier about what gets digitized. This dilemma is the focus of literature like Erway and Schaffner’s (2007) “Shifting Gears: Gearing Up to Get into the Flow,” which argues that users are better served by a higher quantity of digitized collections than a high quality of fewer items (2). They acknowledge that this principle may not apply in every situation (3). The speed and level of detail in digitization can and should vary depending on the value of the items, the volume of documents, and most importantly, the way researchers will work with the digitized item.

The level of detail depends on the kind of access that will be demanded. For items high in demand, librarians can create digital editions that are like facsimiles, accompanied with expert knowledge and interpretation and tools like transcription. A slower digitization process is important for items like valuable and in-demand manuscripts that are difficult to read without transcription.

Faithfully replicating manuscripts requires some text encoding skills – specifically, XML markup encoded according to TEI guidelines. The “NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials” (2002-2003) describes how basic TEI encoding “can be applied nearly automatically using scripts, but detailed encoding requires additional staff, training, and time” (86-7 ).

I want to walk through some examples of TEI projects to show how they differ from  basic digitization.

#1 : Electronic Beowulf

Electronic Beowulf

This online edition of Beowulf provides full translations and helps with definitions and grammar. The website brings together multiple restorations and editions of the text, allowing scholars to make comparisons between all of them. It is the work of editor Kevin Kiernan, a Beowulf scholar, and software engineer Emil Iacob.

#2 : Emily Dickinson Archive

Emily Dickinson Archie

Harvard and Amherst, which both own parts of Dickinson’s archive, collaborated to make this website. Transcriptions on this site show how Dickinson’s editors made changes to her work, like standardizing the arrangement of words on the pages. This site lets users contribute to transcriptions by typing or uploading TEI-encoded documents. I love seeing how libraries use crowdsourcing to engage users – for example, check out the British Library’s crowdsourcing projects.

#3 : Jane Austen’s Fiction Manuscripts

JAFM

This project is the work of English scholar Kathryn Sutherland and digital humanities scholar Elena Pierazzo.  The manuscripts come from the Bodleian Library in Oxford, the British Library in London, the Pierpont Morgan Library in New York, King’s College in Cambridge, and private ownership. This image gives a good example of diplomatic transcriptions; you can see how the transcription reflects Austen’s edits. Sutherland and Pierazzo actually wrote a whole article about their experience making this project, entitled “The Author’s Hand: From Page to Screen.”

These projects almost always seem to be collaborative, dividing up the work involved and frequently bringing together related items from around the world. Some projects are the work of a library or archive, but many are completed by scholars, with libraries just providing copies of the items. To me, that raises the question: should libraries and archives be concerned about making elaborate digital editions of certain manuscripts, or should that work be left to scholars who might know more about the text? There is a lot of opportunity for collaboration, and librarians have a lot of technical support to offer.

Speed and efficiency are important for ensuring access to more of a institution’s collections. For the certain items that are popular and demand more access, this kind of attention to detail can produce wonderful research tools. I also am intrigued at the potential of crowdsourcing. I wonder if libraries could let users generate transcriptions on a wider collection of items. For A/V materials, users could generate captions. The ultimate question is, should libraries and archives involve themselves in more projects like these, or are they too time-consuming? I think, realistically, that most institutions don’t have the resources to commit to these big projects, but could reach out to and collaborate with scholars who undertake such projects. My main point is that libraries can’t always take a cookie-cutter approach to digitization. Some items need just basic detail, while some deserve more description to ensure access.

 

Thanks for reading!

 

References:

Erway, R. and Schaffner, J. (2007). “Shifting Gears: Gearing Up to Get Into the Flow. OCLC Programs and Research. Retrieved from www.oclc.org/programs/publications/reports/2007-02.pdf

The Humanities Advanced Technology and Information Institute, University of Glasgow and the National Initiative for a Networked Cultural Heritage (2002-2003). “The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials.” National Initiative for a Networked Cultural Heritage. Retrieved from http://chnm.gmu.edu/digitalhistory/links/pdf/chapter1/1.17.pdf

Sutherland, Kathryn and Pierazzo, Elena (2012). “The Author’s Hand: From Page to Screen.” Collaborative Research in the Digital Humanities. Marilyn Deegan and Willard Mccarty, eds. Farnham, Surrey, GBR: Ashgate Publishing Group, 2012. 191-212.

Organizing a Digital Music Library

Since I was 12, I’ve maintained a fairly impressive iTunes collection. What began as a place for me to digitize my CDs has grown into a fifty-plus gigabyte collection that is, frankly, a bit unwieldy to browse through. I’ve been thinking about the differences between having a digital musical library versus having a physical music library–one where you can browse through CDs or other methods of music storage, such as vinyl or cassette tapes–and I’ve considered a few differences overall.

  1. In a digital music space, organization is more fluid and more customizable.
    David Weinberger notes in his article “The New Order of Order” that iTunes taught us “the natural unit of music is the track” (para. 5). When you input thousands of songs into a digital repository, you can easily organize them however you wish–by name, by artist, by album, even by genre. This runs counter to having a physical music collection, where everything must be taken as a unit. If you have a copy of Coldplay’s A Rush of Blood to the Head, for example, you can’t just remove one song from the album and place it with others. Likewise, if you have a compilation album of various artists, such as the charity compilation Dark Is the Night, you can’t put songs by individual artists with other work by those artists; the album must be consumed as a unit. Both approaches have their advantages: digital organization helps users find individual similar items fairly easily, while physical collections allow users to see tracks as part of a whole, separate product.
    In addition, digital music libraries allow users to customize playlists–i.e. different groupings of songs based on how the user wishes to group them. I can have a “Workout” playlist that features songs I find good for running. What I think is good running music, however, might not be what someone else considers to be good running music. Playlists end up being hugely subjective and sometimes require guidance from the person who created them; a casual user who is not familiar with the library may need certain groupings explaining to them, as the logic may not be terrifically obvious to begin with, making the library difficult to navigate by themselves.

    Screen Shot 2018-09-24 at 22.36.54
    I know why I’ve grouped these songs together–for exercising–but a stranger would not be able to find a connection inherent between these songs.
  2. Audio quality can vary wildly.
    Like most young people who grew up in the 2000s, I did a good share of illegal downloading as a broke teenager with no money. (I don’t advise resorting to piracy the way I did.) If I couldn’t buy materials off of the iTunes store, I would end up resorting to Limewire, a filesharing service, with mixed results. Limewire was a true crapshoot in terms of what you could download, and as a result I have files that are .MP3, .WMA, .MP4, .AAC, and .WAV. To say they’re not consistent is definitely one way of looking at it. Some of these tracks are lossier than others; you can hear digital artefacts, like pixellated noise, or the audio gets tinny. In addition, some tracks that have been part of my music library for over ten years which were converted into Apple’s “”lossless” format (.AAC) have begun to glitch and erode over time. By contrast, the CDs that I have, for the most part, sound as good as they did the day I got them. I’ve run the risk of scratching them if I played them too many times in my stereo, but otherwise the audio quality has remained the same. It seems that digital music libraries may require constant upkeep and re-updating of files to ensure they do not become lossy over time in a way that physical forms of music may not require.
  3. Metadata becomes more customizable–and potentially nightmarish.
    When you digitize music, there’s usually a good deal of metadata that arrives with the music itself. If you buy from the iTunes store, for instance, the names of songs will already be filled out, as will the name of the artist, the album title, any relevant composers, and even genre. There is usually a space for album artwork and lyrics as well, should you choose to add those. But is there a good, coherent way to organize music digitally by genre in a way that will make sense for casual users?
    Musical genre can sometimes be a minefield. For music aficionados, it becomes a question of whether or not music should be classified broadly (i.e. “rock”) or by narrower subgenres (i.e. “alt-folk,” “acid jazz”). Genre distinctions can be difficult to narrow down, especially in a formal sense. Kulczak and Lennertz Jetton (2011), for instance, note that MaRC (a formal library cataloguing structure) is geared towards classifying classical music rather than pop music, and that LCSH subheadings for newer music genres, like EDM, may not exist yet (218). In addition, users may not agree as to what genre(s) a piece of music may belong to–there’s usually a general consensus among the broader category, but the level of detail is subjective to the user (230). This presents an additional challenge for users who may be navigating an iTunes library with little knowledge of genre, especially if the person who has organized the music library decides to use narrower terms for genres. The controlled vocabulary for a person’s individual music library is going to differ based on their preferences and how much of a music nerd they are; however, it can also make searching within the library difficult if a casual user has no idea what to look for.
    In contrast, a physical music collection only has the metadata that is presented on the CD/vinyl case–nothing more, nothing less. To some degree, that makes it standardized; a hard copy of an Adele album will always have the same tracks listed in the same order and will probably include the composition credits in a leaflet inside the CD without additional frills. Browsing through a physical music library may help someone who is fairly new to music and who does not want to be overwhelmed with information all at once, while browsing a digital music library may be better suited for those who are more familiar with the collection.

Although I know we will be building digital libraries that are different from an iTunes library, I cannot help but think about how I’ve organized my own digital music library over the past few years. Other people must also have music libraries that only make sense to them. It’s fine for personal use, obviously, but not an ideal way to organize a formal digital library for unfamiliar users to come in and browse. Clearly, I’ll need to take a different approach when I begin designing my digital library for this course.

References:

Kulczak, D. E. & Lennertz Jetton, L. (2011). “Lexicon of love”: Genre description of popular music is not as simple as ABC. Music Reference Services Quarterly, 14(4), 210-238.

Weinberger, D. (2007). The new order of order. In Everything is miscellaneous: The power of the new digital disorder. Retrieved from http://arola.kuurola.com/356/spring12/readings/unit1/weinberger_ch1.pdf

Transcribing Audiovisual Recordings

Last week, we read an article from the website for the Oral History in the Digital Age project about digitizing analog video (Pennington & Rehberger, 2012). Analog to digital migration is vital to preserving audiovisual records, but it can be complicated and expensive, especially for libraries and archives with limited resources. It got me thinking: what is the role of transcription in the digital preservation of audiovisual material? Can transcription of audiovisual recordings be a viable alternative or supplement to digital migration? I decided to find out.

I have some experience with transcribing audio recorded interviews. During my senior year of my undergraduate program, I took an introductory family history class. My final project for the semester was conducting an oral history interview with my grandma, after which I transcribed most of the interview and used it to write a short narrative history. I also transcribed a dozen or more interviews I conducted as part of an ethnographic research project for my anthropology degree. I learned quickly that transcription was difficult and time consuming, even when using specialized transcription software. I didn’t have time to transcribe the entire interview I conducted with my grandma and haven’t gone back to finish it since then, despite my good intentions. I’m glad I have as much of that interview transcribed as I do–I continue to fear for the long-term status of the audio recordings and haven’t actively sought out a way to preserve them yet.

So, what do the oral history experts think of transcription as a practice? Were my hours of transcribing those interviews all in vain?

And the answer is–it’s complicated.

Digging around the Oral History in the Digital Age project website, I discovered the article “Transcribing Oral History in the Digital Age” by Linda Shopes (2012), who conducts a fascinating deep dive into the intricacies of oral history transcription.

According to Shopes, transcriptions can be great for some things. Certainly it’s easier to preserve a few paper copies of an interview over several decades than it is to seek out obsolete technology to play old recordings or to constantly transfer audio files to the newest, most up-to-date formats. Transcripts can also make interviews more accessible to more people–not everyone has the time or resources to watch or listen to a recording, and written transcriptions are usually easier to browse for one piece of information than AV recordings, making transcripts particularly valuable for research.

On the other hand, Shopes explains, transcription is time-consuming, expensive, and difficult to do well, or even correctly. Tone of voice, crosstalk, speaker rapport, and personality are difficult if not impossible to fully convey through transcriptions. In other words, they take the “oral” out of “oral history.” Listening to or watching an oral history recording is really the only way to get the full picture. Transcriptions can create the illusion of greater accessibility, but they don’t solve the problems of preserving AV material long term. Additionally, transcribing oral history interviews brings up legal and ethical questions, including issues of copyright, privacy, and consent.

Essentially, the consensus seems to be that while transcription has its place and can be useful for some purposes, it cannot replace actual audiovisual recordings. Ultimately, whether and how much to rely on transcriptions as part of an audiovisual preservation methodology depends on a person’s or institution’s needs and capabilities (like almost every other digital preservation decision, I find). Transcription can be useful–I never would have been able to complete my ethnographic research without it. Additionally, I can share the transcript of my interview with my grandma with family much more easily than I can share the audio file with them. With enough funding, training, and time, transcripts can be useful supplements to audiovisual collections–but the priority should almost always be on the recordings themselves.

Hopefully someday I’ll finish the transcript of the interview with my grandma, but I think for now I’ll focus on backing up the audio file in twenty different places. Hearing her voice is much more satisfying than reading it.

Pennington, S., and Rehberger D. (2012). The preservation of analog video through digitization. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/preservation-of-analog-video-through-digitization/.

Shopes, L. (2012). Transcribing oral history in the digital age. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/transcribing-oral-history-in-the-digital-age/.

The Ethics of Digitizing

 

“Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.” – Dr. Ian Malcolm, from Jurassic Park

Lately, there has been a great rush to digitize every possible item, often with speed as the priority, not critical examination. While the digitization of all these materials certainly makes information more freely available to people, it also can be a cause of concern when it comes to cultural items and personal information. Many times, the image painted of a culture will depend upon what items are chosen. This means that the selection process for digitization can be deeply important in such contexts, and that curators may inadvertently have taken power away from these groups by choosing to represent their cultures in a certain way. Even if a curator digitizes every available item indiscriminately, it is entirely possible to simply pass on an image not welcome by the related community. Especially in America, there is a common problem of representing foreign works with Western bias (on how to chose, arrange, and display items). This can be particularly destructive to communities for a variety of reasons, such as causing discord within the group or giving outsiders a false impression.

How, then, do we go about digitizing cultural items that aren’t very familiar?

In this scenario, accurate interpretation is the key. One way to better judge the ‘story’ of cultural items is to bring in a professional who has studied the particular community in-depth. This will allow for better perspective by putting each item into historical and culture context, and by creating a story that more accurately represents the community. However, even an professional may not fully grasp a community that he/she is not a a part of. Instead, a better way is to invite a representative of that community to support the digitization process. This allows for direct communication about the items, and can help build trust between that community and the institution that is digitizing. In fact, many groups have began to use their online collections ‘as a platform for making their voices and to regain control over their heritage.’

Another cultural concern is that some communities have objects that are supposed to be limited only to certain groups (such as age, gender, or rank) within that community. For example, a sacred text might only be reserved for certain religious leaders of a community. In this case, the digitization of such objects can be neglectful of traditions, or even inspire great disdain from the community. It is important for those digitizing to be wary of such restrictions, and not digitize without first having knowledge of cultural items. Perhaps the largest concern is that digitized materials can reinforce discrimination against a group. This is often done when in item is associated with a certain group by mistake, or when an item was created outside of the group attempting to depict that community. For example, 19th and 20th century Western art of African peoples often depicts them by negative stereotypes, and it would not be appropriate to include such material by tagging it in association with those communities.  By including falsely associated items, excluding important items, or by painting false/bias images, many groups lose control of their own image.

It’s important that the digitization process of cultural items done with caution, and that the communities involved have a say in how they are represented.

 

Sources

  1. Manzuch, Zinaida. (2017). Ethical Issues In Digitization of Cultural Heritage. Journal of Contemporary Archival Studies volume 4: 1-13. Retrieved from https://elischolar.library.yale.edu/cgi/viewcontent.cgi?article=1036&context=jcas
  2. The Humanities Advanced Technology and Information Institute the and National Initiative for a Networked Cultural Heritage. The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials. National Initiative for a Networked Cultural Heritage. Retrieved from http://chnm.gmu.edu/digitalhistory/links/pdf/chapter1/1.17.pdf
  3. Kirschenbaum, Matthew G., Ovenden, Richard, and Redwine, Gabriela. (2010). Digital Forensics and Born-Digital Content in Cultural Heritage Collections. Council on Library and Information Resources. Retrieved from https://www.clir.org/wp-content/uploads/sites/6/pub149.pdf

Authors Guild, Inc. v. Google, Inc.: Are LIS Professionals Any Better Off?

Since the early 2000s, internet juggernaut Google has been on a mission to digitize all the books held in five major American libraries, which includes a combined total of 20 million. Though the most notable, Google is not alone in its quest. Microsoft is another big player in the mass digitization game. It began a project to digitize 100,000 books from the British Library. Microsoft has also partnered with Yahoo to establish the Open Content Alliance (OCA).

It’s important to understand what mass digitization is and what it is not. Mass digitization is not simply a “large-scale project.” Coyle aptly describes it as “the conversion of materials on and industrial scale. That is, conversion of whole libraries without making a selection of individual materials.” In other words, Google and other companies are attempting to digitize everything without discrimination. This contrasts sharply with a non-mass digitization project like Project Guttenberg. The most obvious difference is the sluggish pace that Guttenberg has maintained since 1971 compared to Google’s astonishing rate. However, another important distinction is that Project Guttenberg (as well as the Open Content Alliance) only digitize works that are in the public domain. Google, on the other hand, decided to digitize works in the public domain and copyrighted books.

There are a lot of unique challenges that come with a mass digitization project of this magnitude. Books of unusual sizes or include folded maps are not included in the project (which is problematic for a project that’s aim is to digitize virtually everything). Yet the greatest challenge to Google’s efforts to digitize the world’s books came from a lawsuit. Not long after the ambitious project began, the Authors Guild and numerous writers filed a lawsuit against Google in 2005.

At the heart of this case was whether or not Google’s mass digitization project could be classified as “fair use.” Fair Use allows unlicensed use of copyrighted works in special situations. Unlike the works in the public domain, Google was not making entire copyrighted works available to the public for free. By using Google Books, one could search a work and peruse brief excerpts of this book. The Authors Guild argued that this was an infringement on the creator’s copyright and allowed individuals to use the digitized book without buying a copy. Google countered that this was not an infringement because only select parts were available for viewing. Furthermore, Google argued that its digitization efforts would actually increase book sales as text queries could lead an interested user to a book that they might not have otherwise located without Google Books.

The courts ultimately sided with Google and declared that their mass digitization was not an infringement of copyright law. The Authors Guild appealed the case all the way to the Supreme Court, which turned down the case. As Hahn aptly suggests, “We should be grateful to Google for sticking out its neck—for pushing the envelope on technological innovations, copyright, and other important aspects of digitization.” With its deep pockets, Google could afford to fight the Authors Guild in the courts for over a decade.

The real question for us as LIS professionals is what impact does this case really have on our digitization efforts? Most librarians and archivists are not interested in digitizing copyrighted books, but there are so many other types of materials that we do wish to make digitally available to the public. Are we emboldened by Google’s victory to digitize more materials and stick by our decision to make those copies available for free on the web? Judging by my conversation with others in the field, it sounds like practitioners are still hesitant to make those digitized copies available. In the cases where they do upload a digitized copy and a complaint is made, they virtually always remove the object from the digital library. Librarians and archivists do not have the luxury of Google’s budget. We are still scared of lawsuits. So how much of a victory was Authors Guild, Inc. v. Google, Inc. for LIS professionals, really? While it was a great victory for Google, I would argue that nothing has really changed for digital librarians and archivists. The question we must now ask ourselves is what can we do to ensure greater legal protection for our own digitization projects?

Sources

Alter, A. (2015, October 16). Google’s digital library wins court of appeals ruling. New York Times. Retrieved from https://www.nytimes.com/2015/10/17/business/media/googles-digital-library-wins-court-of-appeals-ruling.html

Coyle, K. (2006). Mass digitization of books. Journal of Academic Librarianship, 32(6), 641-45. Retrieved from http://www.kcoyle.net/jal-32-6.html

Hahn, T. B. (2008). Mass digitization: Implications for preserving the scholarly record. Library Resources and Technical Services, 52(1), 18-26. Retrieved from https://eds-b-ebscohost-com.libdata.lib.ua.edu/eds/pdfviewer/pdfviewer?vid=4&sid=06a5f53d-91e7-41e4-86a5-dcfbb0c81ac0%40sessionmgr102

Liptak, A., & Alter, A. (2016, April 18). Challenges to Google Books is declined by Supreme Court. New York Times. Retrieved from https://www.nytimes.com/2016/04/19/ technology/google-books-case.html