DIVE: Dynamically Linking Collections on the Basis of Events

In this digital cultural heritage project, we provide innovative access to heritage objects from heterogeneous online collections. We use historical events and event narratives as a context both for searching and browsing as well as for the presentation of individual and group of objects. Semantics from existing collection vocabularies and linked data vocabularies are used to link objects and the events, people, locations and concepts that are depicted or associated with those objects. An innovative interface allows for browsing this network of data in an intuitive fashion. The main focus in DIVE is to provide support to (1) digital humanities scholars and (2) general audience in their online explorations.

Cultural heritage collections are being made available digitally in increasing numbers. Online access to cultural heritage provides new opportunities for heritage institutions to have users engage with their objects. This includes access for professionals as well as the general public, searching for objects in either a single collection or in multiple collections at the same time. Many museums and archival institutions that collect objects and audio-visual resources, have embraced information and communication technology and given it a central place in both their research agendas and their public programmes. The implementation and development of online applications for access to cultural heritage are accompanied by many challenges, both technical and intellectual. Currently information access is often provided in the same way: a user of an information system performs simple or complex keyword search to come to a selection of items matching the query. Often filters are provided to manipulate the search results. More complex interfaces allow for faceted browsing to further select subsets of results in cultural heritage objects. However, research has shown that many users seek more exploratory forms of browsing [1].

Recent technical innovations which include Linked Data make it possible to create interactive access to cultural heritage collections not only through direct textual keyword search, but also through structured links between cultural heritage objects and related events, persons, places and concepts. In the Agora project (http://agora.cs.vu.nl), browsing of cultural heritage collections through events has proved successful, opening up so-called 'digital hermeneutics' [2]. Building on the results and experiences of the Agora project, DIVE was started at the end of 2013 and will run until end of 2014. It involves content partners the Netherlands Institute for Sound and Vision (NISV) as well as the Dutch National Library (KB). VU University Amsterdam is a research partner and Frontwise and Zimmerman and Zimmerman are involved in developing the demonstrator frontend and user management modules.

Project Goals
The DIVE project builds on the results and experiences of the Agora project by allowing for event-centric browsing of cultural heritage objects from multiple heterogeneous collections. Within the Agora project we have performed a number of user studies with history students and cultural heritage researchers, which indicated the high value and effectiveness of event-centric collection browsing. Within DIVE, innovative interaction concepts for events and event-based narratives are developed and explored. We explicitly support multiple user groups, including Digital Humanities researchers, professional (commercial) users and the general public. Multiple, heterogeneous collections are made available through the DIVE demonstrator and they are closely interlinked in a common data network. For this, we employ Linked Data standards and practices [3]. This interconnected network of events, persons, places, and concepts, provide context to the cultural heritage objects which are represented in the same networks. Thus, the objects are contextualised with events and narratives, which is crucial for the findability and hermeneutics.

Within the scope of the DIVE project, the collections of the two cultural heritage institutions are enriched, linked and made available for search and browsing through the DIVE demonstrator.
- The Netherlands Institute for Sound and Vision (http://www.beeldengeluid.nl) archives Dutch broadcasting content, including television and radio content. Within the project, a subset of the NISV collection was made available using the OAI-PMH protocol. These are videos of news broadcasts. For these videos descriptive metadata is available including free-text content description.
- The Dutch National Library provides access to historical newspapers. These have been made public through a Web interface and API, Delpher (http://www.delpher.nl). Through this interface, more than 1 Million newspapers from the 17th, 18th, 19th and 20th Century are available. Here, the scanned images, OCRed content and descriptive metadata is available.

Data Enrichment
The textual descriptions and descriptive metadata are enriched so that structured metadata in the form of events, places, persons etcetera are linked to the cultural heritage objects. For this, we employ an ensemble of enrichment methods. These include Natural Language Processing (NLP). More specifically, various Named Entity Recognition and Event extraction tools for Dutch text are employed. Existing tools such as the xTas (http://xtas.net) and Opener toolkits (http://www.opener-project.eu) are utilised and refined to produce the structured data. Crowdsourcing techniques are also employed to have human-recognised entities and to refine the results from NLP. The results from different tools and crowdsourcing are combined to come to high-quality extracted data. These are then consolidated as Linked Data using the Simple Event Model (SEM) [4]. This model allows for the representation of events, actors, locations and temporal descriptions. We also use other often-used Linked Data schemas, including SKOS and Dublin Core to represent other types of resources. Links to external sources, including Wikipedia, DBPedia and Europeana are also established. The resulting dataset is stored in an RDF Triple store, which provides a SPARQL endpoint. The general setup of the project, the data ingestion and enrichment pipeline as well as the interface layers are organised as follows (https://www.dropbox.com/s/dokpfe43w7hu6qg/20140414-DIVE-project.outlines...): cultural heritage content is collected and enriched through the event generators. The resulting graph representation is stored in a tripple store. The data is accessed through a SPARQL interface, on top of which an innovative and intuitive event-centric browsing interface is developed. In the following figure (https://www.dropbox.com/s/jwfchakjtthof8s/20140815_Browser.png) we show the current version of the interface, which is optimised for tablets and modern web-browsers. The live interface can be seen at http://dive.frontwise.com. The interface allows for browsing the linked data graph using visual representations of cultural heritage objects, persons places etc. When a user is inspecting a cultural heritage object, other objects that are related through events related to the first object are also shown. For example, when inspecting a video about the historical event that is the closing off of the Zuiderzee sea, newspaper articles related to the engineer Lely (who was the architect of this effort) or Queen Wilhelmina (who performed the ceremonial closing) are shown as well as other newspaper items or news videos about the same location. In this way, the existing video is placed in a narrative context and explorative browsing is enabled. Further context is provided through external links to for example Europeana (http://www.europeana.eu/), opening up the wider cultural heritage context. Professional and general users can also add their own metadata, providing further enrichment of the data through this crowdsourcing effort.

Current Status
At the time of writing (August 2014), data ingestion from one partner (NISV, the audiovisual archive of the Netherlands) is operative, as well as initial enrichment of the textual descriptions using NLP and crowdsourcing. Sample data has been made available through a ClioPatria triple store. A first version of the interface is developed and is currently under evaluation. In coming phases, data from the Dutch National Library will also be ingested, enriched and linked. The interface will be evaluated by Digital Humanities scholars, professional users and members of the general public.

Data Reusability and Sustainability
The raw enriched data will be made available as Linked Open Data to the general public through the public triple store, allowing for SPARQL queries, Linked Data retrieval and direct download of data dumps. The triple store and the interface will be maintained by the Netherlands Institute for Sound and Vision, ensuring sustainability of the data.

[1] Erp, M. van; Oomen, J.; Segers, R.; Akker, C. van de; Aroyo, L.; Jacobs, G.; Legêne, S; Meij, L. van der;O ssenbruggen, J.R. van; Schreiber, G. Automatic Heritage Metadata Enrichment with Historic Events Museums and the Web 2011 http://www.museumsandtheweb.com/mw2011/papers/automatic_heritage_metadat...

[2] Chiel van den Akker, Susan Legêne, Marieke van Erp, Lora Aroyo, Roxane Segers, Lourens van der Meij, Jacco van Ossenbruggen, Guus Schreiber, Bob Wielinga, Johan Oomen, and Geertje Jacobs. 2011. Digital hermeneutics: Agora and the online understanding of cultural heritage. In Proceedings of the 3rd International Web Science Conference (WebSci '11). ACM, New York, NY, USA, , Article 10 , 7 pages. DOI=10.1145/2527031.2527039 http://doi.acm.org/10.1145/2527031.2527039

[3] Tim Berners-Lee. Linked data - design issues. http://www.w3.org/DesignIssues/LinkedData.html, 2006.

[4] Willem Robert van Hage, Véronique Malaisé, Roxane Segers, Laura Hollink, Guus Schreiber: Design and use of the Simple Event Model (SEM).J. Web Sem. 9(2): 128-136 (2011)


