British National Corpus, Baby edition
Encoding format: TEI XML
Instant secure online access http://ota.oerc.ox.ac.uk/secure/newota/2553.zip (currently only available for UK users with Shibboleth, and via EduGAIN)
British National Corpus is a snapshot of British English in the early 1990s. The British National Corpus is:
BNC Baby consists of four one-million-word genre-based subsets (academic, fiction, newspaper and conversation), in XML with added lemma information and additional, simplified POS-tags for each word. The corpus is described in full at http://www.natcorp.ox.ac.uk/corpus/babyinfo.html.