The Parsed Corpus of Early English Correspondence

The Parsed Corpus of Early English Correspondence is a syntactically-annotated version of 2.2 million words of the Corpus of Early English Correspondece (created by the Sociolinguistics and Language History project team at the Department of English, University of Helsinki). It includes 84 letter collections, consisting of 4790 letters dating from 1410 to 1695. The corpus is annotated with the grammatical and sociolinguistic information necessary for extensive (socio-)linguistic analysis. The corpus can be searched automatically for abstract grammatical structures (such as relative clauses, subject-verb inversion, expletive subjects, etc.), as well as (strings of) words, allowing quick and easy access to the data necessary to investigate virtually any aspect of the language of the period. In addition each sentence is accompanied by searchable information on the writer and recipient (name, gender, relationship to sender/receiver, date of birth, age at time of writing) and the letter (date, authenticity), allowing sociolinguistic investigations of the type commonly carried out on modern languages. In addition, the genre of the corpus, personal letters, yields language closer to the spoken idiom, and thus supplies a valuable corrective to work based on the more usual literary data. As part of a series of annotated corpora which together cover the entire history of English, the corpus can also be used in the study of long-term changes in the history of English.

Project

Academic field

English Language and Literature

Linguistics

Affiliation

University of York

University of Helsinki

Project link

The Parsed Corpus of Early English Correspondence

Funders

Academy of Finland

Arts and Humanities Research Council (AHRC)

TaDiRAH

Meta: Project Management

arts-humanities.net

Principal investigator

Dr Ann Taylor

Principal project staff

Professor Anthony Warner; Professor Susan Pintzuk, Dr. Ann Taylor

Start date

Thursday, May 1, 2003

Completion date

Thursday, December 1, 2005

Era

Modern

Place

England

Source material

The base text of the corpus is The Corpus of Early English Correspondence (CEEC), compiled by the Sociolinguistics and Language History project team at the Department of English, University of Helsinki (http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm). The text is enhanced with grammatical and sociolinguistic annotation.

Data formats

Text file (TXT)