A Shape Retrieval System for Watermark Images

The Institute for Image Data Research and the Conservation Unit, School of Humanities, within the University of Northumbria at Newcastle, were awarded funds by the Arts and Humanities Research Board to undertake this project, which ran from 1st October 2000 to 31st March 2002.

AIMS AND OBJECTIVES

The overall aim of the project was to research a variety of techniques designed to improve the accessibility of historical watermark images in paper to researchers and scholars.

Our specific objectives were:

1. To develop and evaluate automatic shape retrieval techniques for historical watermark images;
2. To compare and contrast the effectiveness of different techniques for capturing watermark images from the original papers;
3. To set up a content-accessible watermarks archive which will form a resource for teaching and research in art and paper conservation.

This project had two distinct and parallel strands, which were complementary but inter-related:

* The first strand entailed the development and evaluation of a content-based retrieval system for historical watermark images, to be known as SHREW (SHape REtrieval of Watermarks), which would allow users to search for images on the basis either of overall shape similarity or similarity of constituent parts. The software drew heavily on our existing ARTISAN system for trademark image retrieval [Eakins et al, 1998], though an important part of the project was the modification and enhancement of our existing image retrieval algorithms to meet the special needs of historical watermark images. In particular, we needed to extend the search modules to provide greater flexibility for matching image components rather than whole images.
* The second strand of the project involved the setting up of a test collection of digitised watermark images. This included a systematic comparison of the different methods for the reproduction of watermarks from original papers. These images were digitised and catalogued in accordance with the IPH (International Association of Paper Historians) standard for the registration of papers, forming a computerised database. With the additional functionality provided by the SHREW system, the database now forms the Northumbria Watermarks Archive. Once complete, the Archive will consist of collections of digitised watermark images reproduced by a variety of methods. Searchers will be able to select a query watermark image to put to the database and retrieve matching images in order of similarity to the query.

The Northumbria Watermarks Archive, which will become an important resource for teaching and research in art conservation within UNN, has the potential to be a major reference tool for researchers in archives, libraries, museums and art galleries as well as departments of art history, and to support teaching in paper history, historical bibliography and the history and conservation of fine art.

arts-humanities.net

Principal investigator
Professor John Eakins
Principal project staff
Professor John Eakins; Mrs Margaret Graham; Miss Jean Brown
Start date
Sunday, October 1, 2000
Completion date
Friday, March 1, 2002
Source material
BACKGROUND TO THE RESEARCH AREA Watermarks are permanent pictorial devices introduced into paper during sheet formation. The device is formed from copper-bronze wire sewn to the surface of the perforated papermakers mould. Each sheet of paper taken from the same mould is identical. The combination of watermark and perforations provides important indicators towards the period of production, the location of the mill responsible, and the origins and authentication of specific papers. Watermark analysis is used by historians, art historians, criminologists and others in order to identify and date paper found in documents, books, manuscripts, prints, drawings, maps, and so on. Watermark identification is based on the comparison of an unknown watermark with known watermarks, whose data are recorded (such as place and date of use, attribution to a paper mill and/or a papermaker). Capturing the watermark images from the original papers can be problematic since many can be obscured by various types of media on top of the paper, such as overwriting, printing or paint, as well as by the secondary supports underneath the paper. A number of reproduction methods are in use, including tracings, transmitted light photography, Dylux 503 (a photosensitive paper technique); Beta radiography; and microfocus radiography. Tracings are cheap and easy to produce but are subjective and often lack accuracy. Transmitted light photography and the Dylux method of contact prints can produce a clear image of the paper structure, but the media layer is also recorded and may obscure the watermark details. Beta radiography has been the most popular method, although it is expensive and slow. Microfocus radiography is a quicker method than beta radiography but can only capture a relatively small site. Recent technological advances, applying image processing techniques, offer alternative solutions [see Stewart et al, 1995; Wenger et al, 1995], but there is still much work to be done in this area. Currently, matching an unknown watermark to known ones is a difficult and time consuming task. Researchers must work manually and painstakingly through thousands of drawings of marks and their supporting text [see, for example, Briquet, 1923]. It is very difficult to provide any sort of referencing system for this process since the watermarks are often quite similar but with distinct pictorial differences or alterations which may be associated with specific dates of manufacture, changes in production technique, or even fraudulent production. It is quite common to find that only part of a watermark is available, which may be due to the rest of the image being obscured (by the media or secondary support) or lost due to cropping or wear and tear. Research into watermarks in paper is hampered by the lack of comprehensive reference sources. Whilst Briquet's work is a standard reference on some 16,000 European watermarks from 1282 to 1600, it excludes the UK and some others from the areas drawn on for materials. Work to document watermarks tends to be restricted to particular collections, e.g. the Digital Watermark and Ornament Catalogue, or to particular artists, such as Rembrandt [Ash, 1986]. The Archive of Papers and Watermarks in Greek Manuscripts is a significant collection of watermarks which has acted as a prototype for a more comprehensive watermarks database. The Watermark Initiative. Access to the Greek Manuscripts Archive is via a comprehensive set of textual search terms. The Thomas L Gravell Watermark Archive contains 6,500 watermarks in paper made between 1400 and 1835 which is also searchable by traditional textual terms. Researchers often need to identify watermark images by similarity of appearance, an application for which content-based mage retrieval (CBIR) techniques [Gudivada and Raghavan, 1995] are eminently suitable. It is perhaps surprising that only one instance of the use of CBIR has been reported. Rauber et al [1997] have developed a CBIR system for historical watermark images which can use either manually-assigned codes such as Briquet's classification or the developing IPH (International Association of Paper Historians) code, or automatically extracted features such as the number, shape and relative locations of image regions (see the demonstration system - SWIC (Search Watermark Images by Content). Rauber et al report promising results on their small test database of 3,000 images, though it is not clear how they judge retrieval success. Their current range of shape features seems quite limited, and their report does not make it clear how (if at all) different kinds of feature can be combined for retrieval purposes. Our project sought to apply techniques developed for our ARTISAN trademark image retrieval system to the development of a workable similarity retrieval system for historical watermarks. There were two reasons to believe that our approach would prove successful: * While there are obvious stylistic differences between modern trademark images and historical watermarks, their similarities from an image processing point of view are quite striking. Both are monochrome images made up of a number of individual components, and both rely on shape elements (rather than colour or texture) to give them visual impact and distinctiveness. It was therefore reasonable to hypothesise that techniques which have proved successful for trademark image retrieval would also prove effective for watermark image retrieval. * The key element which distinguishes ARTISAN from most other image retrieval systems is its exploitation of principles from Gestalt psychology [outlined in Lowe, 1985]. There is independent evidence that Gestalt principles play a significant part in human judgements of image similarity [Goldmeier, 1972], and there is no reason to suppose that watermark images are an exception to this. We felt that a system which retrieves images on the grounds that they look similar to human observers was likely to prove more effective than one which simply looks for the same colour pixels in corresponding positions. METHODOLOGY AND TIMESCALE 1. The software development programme, which was carried out in the Institute for Image Data Research, aimed to follow an iterative process of development, evaluation and refinement. An initial set of digitised watermark images were analysed by ARTISAN software modules, and expert users invited to compare ARTISAN's interpretation of the images with their own. Cases of discrepancy were investigated in detail, and modifications made to the software as appropriate. Additional search facilities were developed, and an initial prototype search system evaluated with the assistance of experts in the field, both on our own test collection, and watermark images available on the Web. Our evaluation methodology was based on that used for our ARTISAN project [Eakins et al, 1997], but expanded to include measures of user satisfaction as well as retrieval effectiveness. Cases of retrieval failure were analysed in detail, in order to pinpoint their causes. These might include poor image quality, differences in image complexity from trademark images, or differences in the way human experts view such images. A modified version of the software was thendeveloped, taking into account the results from the above analysis, and mechanisms for improving search efficiency incorporated, enabling rapid retrieval in a networked environment. 2. The second strand of the research involved a programme of work to set up the watermarks archive. This work was undertaken in the Conservation Unit. Firstly, we conducted empirical research into the most acceptable method of image capture to provide high quality digital images, by systematically comparing and contrasting four different methods for the capturing of watermark images (e.g. transmitted light, Dylux 503, Beta radiography, and microfocus radiography) using a sample of about 50 objects extracted from the collection of papers held in the Conservation Unit. We used flat, single leaf objects in this exercise, rather than bound volumes of paper. The collection consisted of works of art (e.g. watercolours, prints, drawings), archival material, and a set of historic papers. For each object, we captured the watermark image using each of the four methods and then, using a panel of experts, we conducted blind tests to find out which method(s) consistently produced the best watermark images for subsequent digitisation. The captured images were digitised using either a flat bed scanner or a digital camera, forming the basis of the test collection of watermarks images for the parallel software development phase. Each object was catalogued using appropriate standards for the registrations of paper and a database created linking the descriptions with the sets of images. Subsequently, further watermarks will be captured, using the 'best' methods identified in the previous stage, and added to the database. Now that the work is completed, the demonstrator system and database will be mounted on the web as the Northumbria Watermarks Archive. This content-accessible archive will form a live and dynamic resource for teaching and research in art conservation and will complement existing web resources in the area of watermarks. PROJECT MANAGEMENT The programme of work was carried out by two researchers - one based in the Institute For Image Data Research (IIDR) and the other based in the Conservation Unit, under the guidance of Prof. John Eakins, Director, IIDR; Miss Jean Brown, Senior Lecturer, Conservation Unit, School of Humanities; and Mrs Margaret Graham, Principal Lecturer, School of Computing and Mathematics, UNN and Honorary Research Fellow, IIDR.
Publications

A. Jean E. Brown and Richard Mulholland, "The Development of a Digital Archive of Watermark Images", presented at Digital Resources for the Humanities 2001: University of London, July 2001

A. Jean E. Brown & Richard Mulholland "An AHRB Research Project - A Shape Retrieval System for Watermark Images" presented at MUTEC 4th International Trade Fair for Museums, Collections, Restoration and Exhibition Technology, Munich, June 2001

John P Eakins, A. Jean E. Brown, Margaret Graham, Richard Mulholland, Jonathan Edwards, Jonathan Riley, "A Shape Retrieval System for Watermark Images" poster at ICOM-CC Working Group on Graphic Documents, Vantaa, Finland, March 2001

A. Jean E. Brown, Richard Mulholland & Jon Riley, "Watermarks on the Web" presented at 17th Annual CHArt, Conference, November 2001

K Jonathan Riley and John P Eakins "Content-based retrieval of historical watermark images: I – tracings" presented at CIVR2002, London, July 2002

A. Jean E. Brown & Richard Mulholland "The Northumbria Watermark Archive: Using Microfocus X-Radiography and Other Techniques to Create a Digital Watermark Database" submitted to Works of Art on Paper, Techniques and Conservation, IIC 19th International Congress, Baltimore 2002:

A. Jean E. Brown & Richard Mulholland, "When images work faster than words - The Integration of Content-Based Image Retrieval with the Northumbria Watermark Archive", submitted to ICOM-CC 13th Triennial Meeting, Rio de Janeiro, Brazil, September 2002.

Brown, A J E, Mulholland, R, Graham, M E & Riley, J ‘Watermarks on the Web’. In Digital Resources for the Humanities 2001-2002: an edited selection of papers. Anderson, J et al. (Eds.) Office for Humanities Communication, 2003. p.19-33.