The Parsed Corpus of Early English Correspondence

The Parsed Corpus of Early English Correspondence is a syntactically-annotated version of 2.2 million words of the Corpus of Early English Correspondece (created by the Sociolinguistics and Language History project team at the Department of English, University of Helsinki). It includes 84 letter collections, consisting of 4790 letters dating from 1410 to 1695. The corpus is annotated with the grammatical and sociolinguistic information necessary for extensive (socio-)linguistic analysis. The corpus can be searched automatically for abstract grammatical structures (such as relative clauses, subject-verb inversion, expletive subjects, etc.), as well as (strings of) words, allowing quick and easy access to the data necessary to investigate virtually any aspect of the language of the period. In addition each sentence is accompanied by searchable information on the writer and recipient (name, gender, relationship to sender/receiver, date of birth, age at time of writing) and the letter (date, authenticity), allowing sociolinguistic investigations of the type commonly carried out on modern languages. In addition, the genre of the corpus, personal letters, yields language closer to the spoken idiom, and thus supplies a valuable corrective to work based on the more usual literary data. As part of a series of annotated corpora which together cover the entire history of English, the corpus can also be used in the study of long-term changes in the history of English.

arts-humanities.net

Principal investigator
Dr Ann Taylor
Principal project staff
Professor Anthony Warner; Professor Susan Pintzuk, Dr. Ann Taylor
Start date
Thursday, May 1, 2003
Completion date
Thursday, December 1, 2005
Era
Place
Source material
The base text of the corpus is The Corpus of Early English Correspondence (CEEC), compiled by the Sociolinguistics and Language History project team at the Department of English, University of Helsinki (http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm). The text is enhanced with grammatical and sociolinguistic annotation.
Data formats