Welcome to the French Oral Narrative Corpus, the first digitised corpus of oral storytelling in French. The project was funded by the Arts and Humanities Research Council and the British Academy. It was created in partnership with the Conservatoire contemporain de Littérature Orale in Vendôme (one of France's leading centres for oral storytelling) and the Oxford Text Archive, a digitised collection of thousands of texts in over twenty-five languages, including a range of linguistic corpora.

You can listen to, view, download and search the corpus files on this website, where you will also find information on the methodology employed in the project and on the broader linguistic and digital context. The corpus also forms part of the Oxford Text Archive where it is available to view and download.

Who might be interested in the corpus ?

This is a transcribed and annotated corpus which may be exploited by researchers working in a variety of fields. For those working in French Linguistics, the corpus can be used as a dataset for the analysis of a wide variety of linguistic phenomena, including syntactic, lexical, pragmatic or discoursal features. It will be of particular interest to those whose research is on oral French and specifically to those interested in the language of oral narrative. Folklorists, anthropologists or literary scholars could use the corpus for projects on phenomena such as motifs, formulae, images, story variants or narrative structure.

What does the corpus contain?

The corpus contains 87 stories told by 18 different storytellers. For each story, there is a sound recording, a fully encoded xml version using Text Encoding Initiative markup (available in both TEIP5 and TEIP4), encoded PDF and HTML versions, and stripped PDF and HTML versions.

The xml files are annotated, using TEI markup, for a range of contextual phenomena (such as laughter, sighs etc.) and for a number of linguistic phenomena (speech and thought presentation, syntactic detachment, subject-verb inversion and the retention or loss of negative 'ne') that are of key interest for research on oral discourse. For researchers working on these phenomena, the encoded xml files provide the best opportunity for complex linguistic searches which could form the basis of qualitative or quantitative analyses. The xml files could be further annotated and searched for other phenomena, linguistic or otherwise: new tagsets within the TEI conventions could easily be devised and deployed. Simple lexical searches can also be carried out. For those who do not use xml, the annotated PDF and HTML versions show the texts with linguistic markup. The stripped PDF and HTML files show the texts with minimal markup in order to make the core text accessible in a readable format. These could be used by researchers interested exclusively in the text itself (e.g. literary scholars or folklorists) and not in any of the linguistic markup.

To view, download or search the corpus files, or to find out about any aspect of the methodology or context, scroll down the navigation bar on the right hand side of each page.

Janice Carruthers
Principal Investigator