Digital Context

How does this corpus relate to other linguistic corpora of spoken French ?

Since pioneering work on sociolinguistic methodology by William Labov in the 1970s, corpora of spoken language have proliferated around the world. For the French language, such corpora have been developed and exploited in metropolitan France, in Belgium and in French-speaking Canada. Amongst the best-known early corpora were the Sankoff-Cedergren corpus of Montréal French, the Ottawa-Hull corpus and the Orléans corpus. More recent digitised corpora include the DELIC corpus of oral French (http://sites.univ-provence.fr/delic/corpus/index.html), the corpus developed by the project 'Phonologie du français contemporain' (http://www.projet-pfc.net/), and the Valibel corpus (http://www.uclouvain.be/valibel-corpus.html). In recent years, digitised collections of such corpora have been grouped for online access; for example, via the 'Corpus de langues parlées en interaction' (CLAPI - http://clapi.univ-lyon2.fr/V3_Accueil_Corpus.php), the 'Délégation Générale à la Langue Française' (http://www.dglflf.culture.gouv.fr/recherche/corpus_parole/corpus_en_ligne.htm), and IRCOM (Inventaire des corpus oraux et multimédiaux: http://ircom.corpus-ir.fr/wiki/doku.php?id=wiki:enquete).

The type of spoken data captured in existing corpora is dominated by informal conversational discourse, although there is also interview data as well as television and radio material. Indeed, the need for corpora and analyses of a wider range of oral 'genres' has been increasingly recognised (see Bilger and Cappeau, 2004), since register, medium and discourse type play a major role in speakers' linguistic behaviour. As far as narratives are concerned, although there are multiple examples of conversational narratives in existing corpora, there is no digitised corpus of contemporary oral narrative.

Digitised methodologies for data transcription, annotation and preservation have developed extremely rapidly in the past ten years. Across different languages and cultures, there is an increasingly recognised need to develop shared good practice and ultimately to achieve interoperability between annotation systems. In France, many important current developments come under the aegis of the national Digital Humanities infrastructure, ADONIS (www.tge-adonis.fr) or, more specifically in the case of oral corpora, under the brief of IRCOM (http://www.typologie.cnrs.fr/spip.php?rubrique5&lang=fr). The choice of the Text Encoding Initiative methodology for this project is very much in line with current developments towards interoperability at an international level. To consult a recent French corpus of written material annotated using TEI methodology by the ANNODIS project in Toulouse le Mirail, click here.