French Oral Narrative Corpus


French Oral Narrative Corpus


Janice Carruthers


Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. This is a very liberal license that grants certain rights for non-commercial use, especially your right to use the French Oral Narrative Corpus for your own research, but also reserves certain rights for the original creators of the corpus.

Download: zip



Editorial Practice

Encoding format: TEI P4 XML

Encoding format: TEI P5 XML

Encoding format: PDF

Encoding format: HTML

OTA keywords

Linguistic corpora

LC keywords

Linguistics analysis (Linguistics)

  • designation: CollectionText
  • size: 520 files: 22.7 MB
Creation Date


Source Description

The corpus contains 87 stories told by 18 different storytellers. Recordings belong to the collection of the Conservatoire contemporain de littérature orale in Vendôme, France. There is around 1000 minutes of speech in total. The stories include a wide range of story types, including, amongst others, contes merveilleux/marvellous tales, contes facétieux/jokes or anecdotes, in addition to myths and legends from a wide variety of sources. All stories were recorded in authentic storytelling contexts, with both storyteller and audience present. The storytellers come from a variety of regions in France and all have French as their first language. Full details about the recording, the storytelling context (venue, audience, timing), the storyteller (age, gender, regional background, educational background etc.) and the story type are given in the Header of each xml file.

For each story, there is a sound recording, a fully encoded xml version of the story (available in both TEIP4 and TEIP5), encoded PDF and HTML versions, and stripped PDF and HTML versions. The xml files are annotated, using TEI markup, for a range of contextual phenomena (such as laughter, sighs etc.) and for a number of linguistic phenomena that are of key interest for research on oral discourse (speech and thought presentation, syntactic detachment, subject-verb inversion and the retention or loss of negative ‘ne’).

The website contains more information on the annotation and on the background to the corpus, including discussion of the research context and the digital context. It also offers search tools for users working on particular structures.


The project was the beneficiary of two research grants:

  • Arts and Humanities Research Council Grant no AH/E000649/1
  • British Academy Grant no SG39350

The corpus creator also acknowledges the input of others in the creation of this resource:

  • James Cummings (University of Oxford), consultancy on digitisation and text encoding
  • Dehra Scott (Queen's University, Belfast), contributed to the transcription of data
  • Amélie Rougeot (Queen's University, Belfast), contributed to checking the transcribed data
  • Gavin Mitchell (Queen's University, Belfast), web page design and consultancy

Permanent URL