GerManC. A Historical Corpus of German Newspapers 1650-1800 (expanded version)
Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. This is a very liberal license that grants certain rights for non-commercial use, especially your right to use GerManC for your own research, but also reserves certain rights for the original creators of GerManC.
Encoding format: TEI Lite P5 XML; GATE XML; GATE column format; plain text
Expanded and revised version of http://purl.ox.ac.uk/ota/2537
Various: see documentation in the download package. :
Following the model of the ARCHER corpus and given the aim of representativeness, the GerManC corpus consists of text samples of about 2000 words from eight genres: drama, newspapers, sermons and personal letters (to represent orally oriented registers) and narrative prose (fiction or non-fiction), scholarly (i.e. humanities), scientific and legal texts (to represent more print-oriented registers). In order to facilitate tracing historical developments, the whole period was divided into fifty year sections (in this case 1650-1700, 1700-1750 and 1750-1800), and an equal number of texts from each genre was selected for each of these sub-periods.
The complete corpus thus consists of 360 samples, comprising approximately 800,000 words. Appendix 1 in the download package contains a lists of the files in the corpus with full documentation in an Excel spreadsheet.
Publications based on the data include: