British National Corpus, XML edition

Title
  • British National Corpus, XML edition
  • BNC XML
Author

BNC Consortium

Availability

Distributed by the University of Oxford under the BNC User Licence.

Download: click to apply for permission to download as required by the licensing restrictions (this will open a form on another page)

Languages

English

Editorial Practice

Encoding format: TEI XML

OTA keywords

Linguistic corpora
Corpus

LC keywords

Linguistics
Linguistics analysis (Linguistics)

Extent
  • designation: CollectionText
  • size: 4049 files: c. 515 Mb
Source Description

Instant secure online access http://ota.oerc.ox.ac.uk/secure/newota/2554.zip (currently only available for UK users with Shibboleth).

By downloading the corpus, you agree to abide by the terms of the BNC User Licence, and you are obliged to ensure that all users of the corpus to whom you grant access must also abide by these terms.

Notes

British National Corpus is a snapshot of British English in the early 1990s. The British National Corpus is:
  • a sample corpus: composed of text samples generally no longer than 45,000 words.
  • a synchronic corpus: the corpus includes imaginative texts from 1960, informative texts from 1975.
  • a general corpus: not specifically restricted to any particular subject field, register or genre.
  • a monolingual British English corpus: it comprises text samples which are substantially the product of speakers of British English.
  • a mixed corpus: it contains examples of both spoken and written language.

The corpus is described in full in the Users Reference Guide at http://www.natcorp.ox.ac.uk/docs/URG/.

Some XSL files are available for reformatting the XML texts in various ways, also from the BNC web site.