High Throughput Humanities e-Research (HiTHeR) and FReSH (Forging Restful Services for e-Humanities)

High Throughput Humanities e-Research (HiTHeR) aimed to create a prototype system for analysing the Nineteenth Century Serials Edition (NCSE) corpus. The NCSE contains around 430,000 articles that originally appeared in roughly 3,500 issues of six 19th Century periodicals.

The project investigated the use of grid technologies and high throughput computing to provide more intuitive ways of searching the NCSE’s large corpus. Specifically, the project set up a prototype campus grid and used it for carrying out text processing on this corpus. The project was tied in with campus grid activities at King’s and the National Grid Service.

A follow-on project, FReSH, added to the HiTHeR textual analysis agent an effective and light-weight integration within a wider digital ecosystem for e-Humanities research. In the earlier HiTHeR implementation, people wishing to use the services had to collect the results of document similarity calculations from a text file and manually integrate them into their own web resource. FReSH enhanced this by using ReSTful web services to deliver the outputs of document similarity services in machine-readable formats so that they can easily be integrated into web environments more easily.

Project

Academic field

Librarianship

Information & Museum Studies

Linguistics

Affiliation

King's College London

National Grid Service

Project link

High Throughput Humanities e-Research (HiTHeR) and FReSH (Forging Restful Services for e-Humanities)

Funders

Joint Information Systems Committee (JISC)

TaDiRAH research objects

Textual interaction (asynchronous)

arts-humanities.net

Principal project staff

Richard Palmer, Tobias Blanke, Mark Hedges

Start date

Monday, September 1, 2008

Completion date

Tuesday, December 1, 2009

Source material

The Nineteenth Century Serials Edition (NCSE - http://www.ncse.ac.uk/) was the source material to which text mining/analysis techniques were applied. A follow-on project, FReSH (http://fresh.cerch.kcl.ac.uk/), used two additional resources: People's War (http://www.bbc.co.uk/ww2peopleswar/) and Serving Soldier (http://www.kcl.ac.uk/iss/archives/servingsoldier/).

Data formats

Extensible Hypertext Markup Language (XHTML)

Atom

Publications

- Blanke, Brey, Hedges, Palmer, ‘Text analysis of large corpora using High Throughput Computing’, Proceedings of Digital Humanities 2009, Maryland, 2009.
- Blanke, Hedges, Palmer, ‘Restful services for the e-Humanities – web services that work for the e-Humanities ecosystem’, IEEE Digital Ecosystems, Istanbul, 2009.