Archaeotools: Data mining, facetted classification and E-archaeology

This two year project built upon previous ADS work to develop tools (the Common Information Environment - Archaeobrowser project) using advanced data mining and knowledge capture technologies to allow archaeologists to discover, share and analyse datasets and legacy publications that had hitherto been very difficult to integrate into digital frameworks. The project had three interrelated objectives, each represented by a distinct workpackage.

arts-humanities.net

Principal investigator
Professor Julian Richards
Principal project staff
Prof. Julian Richards, Dr Stuart Jeffrey, Prof. Fabio Ciravegna, Stewart Waller, Ziqi Zhang, Sam Chapman, Tony Austin
Start date
Saturday, September 1, 2007
Completion date
Tuesday, September 1, 2009
Era
Place
Source material
The project consists of three work packages each dealing with a particular type of data. Workpackage 1 - The underlying dataset comprises over 1,000,000 records (held in Oracle RDBMS) aggregated from the National Monuments Records of Scotland, Wales and England as well as Historic Environment Records from numerous local authorities and the ADS’s own archive holdings. The facets selected will be standard hierarchical ‘What’, ‘Where’, and ‘When’ facets plus a ‘Media’ facet to allow the selection of particular subsets of resources. The facets are populated from existing thesauri (e.g. the Thesaurus of Monument types) in XML format and extended/integrated to allow for geographical differences, such as terminological differences in monument and period types between Scotland and England. The Archaeotools project also integrates thesauri served in XML by Simple Knowledge Organisation Systems (SKOS ) based web services developed by the AHRC-funded Semantic Tools for Archaeology project (STAR ) based at the University of Glamorgan. Work Package 2 - deals with primariy unpublished archaeological reports (grey literature), in total approximately 1000 reports ranging from 10 to 500 hundred of pages. These reports are published by a wide range of archaeological organisations. As an example, OASIS project actively gathers digital versions of grey literature fieldwork reports and currently holds around 2300. This total grows by around 50-100 reports a month; all reports can be downloaded, free of charge, from the ADS. Work Package 3 - The system is extended to capture metadata from legacy historical documents, using the PSAS (annual Proceedings of the Society of Antiquaries of Scotland, from 1851 to 1999) as an exemplar corpus and utilising the University of Edinburgh’s geoXwalk service to recast place names and locations extracted from text as national grid references (NGRs), allowing enhanced geospatial searching of the data.
Publications

The Archaeotools project, faceted classification and natural language processing in an archaeological context.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038

S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)

Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.