British National Corpus, Baby edition

Title
  • British National Corpus, Baby edition
  • BNC Baby
Author

BNC Consortium

Availability

Distributed by the University of Oxford under the BNC User Licence. Clicking to download implies acceptance of the licence conditions.

Download: zip

Languages

English

Editorial Practice

Encoding format: TEI XML

OTA keywords

Linguistic corpora
Corpus

LC keywords

Linguistics
Linguistics analysis (Linguistics)

Extent
  • designation: CollectionText
  • size: 182 files: ca. 195 MB
Source Description

Instant secure online access http://ota.oerc.ox.ac.uk/secure/newota/2553.zip (currently only available for UK users with Shibboleth, and via EduGAIN)

Notes

British National Corpus is a snapshot of British English in the early 1990s. The British National Corpus is:
  • a sample corpus: composed of text samples generally no longer than 45,000 words.
  • a synchronic corpus: the corpus includes imaginative texts from 1960, informative texts from 1975.
  • a general corpus: not specifically restricted to any particular subject field, register or genre.
  • a monolingual British English corpus: it comprises text samples which are substantially the product of speakers of British English.
  • a mixed corpus: it contains examples of both spoken and written language.

BNC Baby consists of four one-million-word genre-based subsets (academic, fiction, newspaper and conversation), in XML with added lemma information and additional, simplified POS-tags for each word. The corpus is described in full at http://www.natcorp.ox.ac.uk/corpus/babyinfo.html.

Permanent URL

http://purl.ox.ac.uk/ota/2553