Digital language resources in Oxford

1. Resources available online in Oxford

Below is a list of some of the resources for which groups within Oxford have licences, and to which students and staff have access.

The University of Oxford has licences for 1997, 2008, 2009, 2010, 2013 and 2015 for the Linguistic Data Consortium. Take a look at their catalogue, and if there is something there that you are interested in, and you don't see it in the list below, please get in touch with Martin Wynne. Thanks to OUP who paid for the 2009 licence in full for the University, Department of Computer Science who paid for the 2010 and 2015 licences, and the Phonetics Laboratory for 1997 and 2013. The following resources have been downloaded from the LDC and are now available online from IT Services for Oxford users. Consult the LDC catalogue for the full list of what is available, and get in touch with ota at Please note that you are bound by the terms and conditions of the user agreements associated with each of these resources, which can be found on the LDC website.

Please visit the LDC website for more information about these resources, and to consult the relevant licence agreements. Note that these resources are for use by members of the University of Oxford, and you are not permitted to redistribute them.

If the files are too big for you to download over the web, get in touch via ota at

The following have also been downloaded by the Phonetics Laboratory and might be available by arrangement.

  • LDC94S13B CSR-II (WSJ1) Sennheiser
  • LDC96L17 CALLHOME Japanese Lexicon
  • LDC96T18 CALLHOME Japanese Transcripts
  • LDC94S13A CSR-II (WSJ1) Complete

2. Resources available via CLARIN

The UK is a member of the CLARIN European Research Infrastructure Consortium, which offers easy access to language data and tools for research in the humanities and social sciences. The latest up to date information on activities and resources can be found at CLARIN website. The University of Oxford is home to the co-ordination of the CLARIN-UK Consortium.

Certain resources have restricted access but are now accessible to authenticated users from Oxford - see the following page:

  • CLARIN protected resources including resources in Czech, Danish, Dutch, English, German, Norwegian, and online interfaces to a number of other languages via the Corpuscle archive at the University of Bergen, including Abkhazian, Bulgarian, Older Scots, Persian, Slovenian, among others. In most cases, you need to log in to these sites by following the link to 'Log in via your institution' or 'EduGAIN', or simply 'Log in', and you will be redirected to WebAuth.
  • Virtual Language Observatory is the gateway to a larger number of resources. The VLO is a resource discovery service aggregating records for resources held in most of the major archives world-wide.
  • CLARIN Resource Showcases is a way to explore a small selection of open access online corpora and lexical resources offered by CLARIN, including resources in Czech, Dutch, Finnish, German and Swedish
  • CLARIN-UK also makes available to users in Oxford a number of important resources, provided by the members of the CLARIN-UK Consortium.
  • Oral History & Technology: a new website which will feature tools for processing audio data, including speech synthesis and alignment.

3. Other corpus resources at the University of Oxford

There are further corpora, copies of which may be available in Oxford, but under a variety of different licensing and access arrangements (often on optical disk). Please get in touch to add to the list. For these resources, contact Martin Wynne unless otherwise stated.

  • BNC XML version, BNC Baby (sampler on one CD)
  • Corpus of Spoken Dutch
  • Corpus of Spoken Japanese
  • IPI-PAN corpus of Polish
  • COLT Corpus of London Teenagers' Speech
  • Gesprochenes Jiddisch Textzeugen einer Europäisch-jüdischen Kultur
  • ICAME corpus collection
  • East meets West: a compendium of multilingual resources (the TELRI CD, parallel aligned corpora in many European languages)