Martin Durrell; Paul Bennett; Silke Scheible; Richard J. Whitt


Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. This is a very liberal license that grants certain rights for non-commercial use, especially your right to use GerManC for your own research, but also reserves certain rights for the original creators of GerManC.

Editorial Practice

Encoding format: TEI Lite P5 XML; GATE XML; GATE column format; plain text

Creation Date

The corpus was constructed between 2008 and 2011.

Source Description

Expanded and revised version of

Various: see documentation in the download package.

Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods.

The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet.