British National Corpus Sampler

Title
  • British National Corpus Sampler
  • BNC Sampler
Author

BNC Consortium

Availability

Distributed by the University of Oxford under the BNC User Licence. Clicking to download implies acceptance of the licence conditions.

Download: zip

Languages

English

Editorial Practice

Encoding format: TEI XML

OTA keywords

Linguistic corpora
Corpus

LC keywords

Linguistics
Linguistics analysis (Linguistics)

Extent
  • designation: CollectionText
  • size: 185 files: ca. 62.1 MB
Source Description

Instant secure online access http://ota.oerc.ox.ac.uk/secure/newota/2551.zip (currently only available for UK users with Shibboleth).

Notes

The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million words each, compiled to mirror the composition of the full BNC as far as possible. The word-class annotation of the BNC Sampler texts has been carefully checked and manually corrected. The Sampler was first created at Lancaster University during the creation of the BNC. More information about the Sampler can be found in the users reference guide for the BNC Sampler: XML Edition [.pdf file]

British National Corpus is a snapshot of British English in the early 1990s. The British National Corpus is:
  • a sample corpus: composed of text samples generally no longer than 45,000 words.
  • a synchronic corpus: the corpus includes imaginative texts from 1960, informative texts from 1975.
  • a general corpus: not specifically restricted to any particular subject field, register or genre.
  • a monolingual British English corpus: it comprises text samples which are substantially the product of speakers of British English.
  • a mixed corpus: it contains examples of both spoken and written language.

The corpus is described in full in the Users Reference Guide at BNC User Reference Guide.

Permanent URL

http://purl.ox.ac.uk/ota/2551