Digital Memory VI. Abstracts
"Web Archiving: Preserving the History of Data-Driven Society"
28 January 2015
Main Conference Hall, National Library of Estonia
(Tõnismägi 2, Tallinn)
From Print to Digital: Legal Deposit Principles in the Digital Era
Our cultural heritage has now largely gone digital : all kinds of cultural artifacts – books, newspapers, audiovisual material and artworks – have online equivalents, while the internet has created its own forms of intellectual and artistic expression - from blogs to social networks. The traditional missions of libraries (collecting, describing, preserving and giving access to cultural holdings) is challenged by this dramatic switch. This contribution intends to present what kind of solutions libraries, and especially national libraries, are experimenting worldwide to face these issues. It will especially focus on the legislative and the scientific aspects: this overview will point out the objectives, the advantages and the shortcomings of the different archiving models developed by heritage institutions.
Clément Oury is head of Digital Legal Deposit at the National Library of France. This service is in charge of collecting and preserving a large part of BnF's born-digital heritage: web archives, e-newspapers and e-books. Clément Oury also serves as Treasurer of the International Internet Preservation Consortium (IIPC). He is has a PhD in early modern history at the University of Paris-Sorbonne.
The Document Cycle: from Selection to Access, Curatorial Practices
This short presentation will point out how curatorial practices are regularely adapted to take in consideration the specificities of web documents. By the use of the same tools (NetarchiveSuite and Heritrix), BnF and Netarkivet have a similar workflow although they have different collection policies.
Annick Le Follic, has worked at the Bibliothèque nationale de France as a digital curator since 2008. She has participated to the instruction of the first in-house broad crawl in 2010 and now manages the planification of broad and focused crawls.
Sabine Schostag, Netarkivet, Statsbiblioteket, web curator, has worked with web archiving since 2005, mainly with the curation of selective crawls, when the internet became part of Legal Deposit in Denmark.
Estonian Web Archive as a Source for Researchers of Today and Tomorrow
We have the privilege to experience these major changes in communication that have occurred in the human history only a couple of times before. The data on the web describes the society and the individuals in an unprecedented way. That forms a new type of information source for the researchers of different disciplines. The presentation offers an insight to the challenges of National Library of Estonia to build the Estonian Web Archive and make the collections accessible to the researchers as well as general public.
Jaanus Kõuts is the senior specialist on web archiving at the National Library of Estonia.
Long and Winding Road to the Spanish Web Archive
Nowadays there is a growing concern about the loss of online information and documentation in Spain. Since a new Legal Deposit Law was enacted on 2011 to include online publications, a royal decree has been drawn that will regulate the management of the non-print legal deposit, and it is about to be published. But before, a long and –sometimes difficult- road we had to cover to get to this point. The first Spanish web collection (a bulk crawl of the .es domain) was crawled in 2009 by Internet Archive on behalf of the National Library of Spain. Internal library procedures, national collaboration, technical decisions, test crawls… are all part of these steps I’ll share with the audience.
Mar Pérez Morillo is Chief of Web Service at the National Library of Spain. She is in charge of the web archiving project at the Library since 2009 and was a member of the group that draw the royal decree that will regulate the non-print legal deposit. She is the coordinator of the Working Group “Legal Deposit and Digital Preservation”, at the Council of Library Cooperation of the Ministry of Education and Culture. She is the representative of the National Library of Spain at the Steering Committee of the IIPC.
Probing a Nation’s Web Domain: the Development of the Danish Web 2005-2015
This presentation sets out to outline an analytical design as well as discuss the methodological questions involved in making a historical study of a national web. Our main research questions are: How to study a nation’s web sphere and its developments over time? And more specifically: What has the entire Danish web looked like in the past, and how has it developed?
We will also brief introduce the Danish Netarchive, the Danish national research infrastructure for internet studies, NetLab, and the European network of RESAW (http://resaw.eu/), an acronym for ‘A Research infrastructure for the Study of Archived Web materials’, which is a collaboration between leading European web archives, and research communities studying the archives.
Ditte Laursen (Ph.D.) is senior researcher and managing curator at the Danish Netarchive.
Language Identification During Crawling
During the project “Finno-Ugric Languages and the Internet“ the web will be crawled to find sites that have been written in small Uralic languages. For this purpose, a prototype of an automated system is build to maintain a list of links to the discovered sites. From this list, it is possible to build web portals, through which the sites written in distinct languages can be reached more easily. In this way, we can help the users of endangered languages to find each other and to uphold their linguistic culture.
Tommi Jauhiainen is doctoral student of Finno-Ugric Languages and the Internet at the Department of Modern Languages at University of Helsinki.
etTenTen13, the First Web Corpus for Estonian
In this presentation we will introduce the first Estonian web corpus, etTenTen13, which was compiled by the Institute of the Estonian Language in collaboration with Lexical Computing Ltd. and Filosoft LLC from 2013 to 2014. The corpus contains 260 million words. We will describe the procedure of corpus compilation and show in detail the corpus structure and metadata. We will focus on the usage of etTenTen13 in lexicographical work particularly and in linguistic research generally.
Jelena Kallas is a senior lexicographer at the Institute of the Estonian Language, Tallinn, Estonia. Her research interests include corpus linguistics, corpus lexicography and pedagogical lexicography.
Maria Tuulik is a lexicographer at the Institute of the Estonian Language and a doctoral student at the University of Tartu. Her research interests are corpus lexicography, pedagogical lexicography and lexical semantics.
Reimagining the Future of Libraries as Conveners of Information and Innovation
Imagine a world in which libraries and archives had never existed. No institutions had ever systematically collected or preserved our cultural past: Every book, letter and document was created, read and immediately thrown away. Alternatively, what if everything had been kept and the Library of Alexandria had survived to present day, archiving all societal knowledge through the millennia? How would life be different in these two worlds, one of no history and one of all our history, and what can this suggest to us of the future role of libraries in society? Today both of these worlds have become reality: Libraries ship the physical book world of our history off to storage, eliminating the serendipitous discovery of browsing, while the Web simultaneously creates a virtual Library of Alexandria that unifies societal knowledge. No longer do libraries serve as gatekeepers to the world’s information: The Web has democratized access to information and with a single mouse click provides far more than any single library could ever offer. Have libraries truly been rendered obsolete in the digital world? What if we could bring scholars, citizens and journalists together, along with computers, digitization and “big data” to reimagine libraries as centers of information innovation that help us make sense of the oceans of data confronting society today? This talk will focus on reimagining the role of libraries in the 21st century and especially on how the vast new emerging archives of the web can enable fundamentally new kinds of research with libraries playing central roles as conveners and enablers.
Kalev H. Leetaru is the past 2013-2014 Yahoo! Fellow in Residence for International Values, Communications Technology and the Global Internet and a former Adjunct Assistant Professor in the Edmund A. Walsh School of Foreign Service at Georgetown University, and is currently a Council Member of the World Economic Forum's Global Agenda Council on the Future of Government. He holds three US patents (cited by a combined 51 other issued US patents) and his work has been profiled in Nature, the New York Times, The Economist, BBC, Discovery Channel and the presses of more than 100 nations. He has recently concluded projects with NBC Universal’s SyFy channel, United States Institute of Peace, World Bank and United States Army, exploring how “big data” can reshape the way we understand the world around us. Kalev founded and leads the GDELT Project, supported by Google Ideas, which monitors a considerable cross-section of the world's broadcast, print, and web news media from nearly every country each day in over 100 languages and identifies the people, organizations, locations, themes, and events driving global society. His work focuses on how innovative applications of the world's largest datasets, computing platforms, algorithms and mind-sets can reimagine the way humanity understands and interacts with the global world. In December 2013 he was named as one of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013. More on his latest projects can be found on his website at http://www.kalevleetaru.com/ or http://gdeltproject.org