Media: CD
Recorded on location by the Conservatoire Contemporain de Littérature Orale
(CLIO). All recordings are held in the CLIO
This is the first transcribed and annotated corpus of oral storytelling in French. The corpus consists of 87 stories, from a range of story types, told by 18 different storytellers. All recordings are owned by the CLIO and used with the kind permission of its Artistic Director, Bruno de La Salle. There are just under 1000 minutes of talk and the stories vary in length from 2 minutes to over 48 minutes, with an average of around 11 minutes. The stories are transcribed and marked up using Text Encoding Initiative (TEI) methodology, including encoding for several linguistic features, i.e. speech and thought presentation, left and right detachment, inversion, and negation (retention or loss of negative ne-), as outlined in the Interpretation section below and listed in the Taxonomy.
The stories included in the corpus were chosen according to a number of criteria. The primary goal was that the recordings would represent storytelling in an authentic range of naturally-occurring contexts. The idea was to obtain a variety of types of story (contes merveilleux/marvellous tales, contes facétieux/jokes or anecdotes, contes d’animaux/animal stories etc.) from a range of storytellers who draw on a multiplicity of sources. All the storytellers are ‘new storytellers’, in that they are highly literate, educated speakers who have acquired their stories primarily from written sources rather than in oral form as part of an inter-generational or community-based oral tradition. All have French as their first language and come from different regions of France; where they have indicated that they speak other languages these are noted in the ‘particDesc’ section of the Header. All are adults, ranging in age from 30+ to 70+, including both male and female storytellers. Full sociolinguistic information relating to the storytellers was requested from them and is given in the ‘partDesc’ section of the Header below (gender, age bracket [45- or 45+], regional origins and residences, professional and educational background). All recordings were made in a natural context of live storytelling, some in a more intimate format (i.e. ‘Perce Oreille’, where the audience numbers would be under 50) and some with a larger audience (e.g. ‘Rencontres d’Eté’ or ‘Rencontres d’Hiver’, where there could be several hundred in the audience), in order to compare the two; information relating to the venue is also noted in the ‘particDesc’ section of the Header. Information on the broad story type (e.g. Marvellous Tale, Animal Tale, Greek Legend) and the Aarne-Thompson typological classification, in cases where it was possible to identify it, was supplied by the CLIO (courtesy of Camille Coursault) and is given in the ‘textClass’ section of the Header. Although this information is recorded in the Header and can be used by researchers to inform the analysis in important respects, there was no attempt to stratify the corpus according to any of the above criteria; indeed, in some areas, the corpus reflects certain real imbalances, e.g. there are more older storytellers than younger ones and there are more females than males. In some instances, there are two different recordings of the same story; these were included in order to compare the linguistic features of the two versions.
An initial draft of sections of the transcriptions was completed by Dehra Scott, with funding from a British Academy Small Grant (SG39350) awarded to Janice Carruthers. Janice Carruthers was responsible for producing all the full transcriptions. Amélie Rougeot was responsible for checking the transcriptions and clarifying problematic issues. James Cummings of Oxford University Computing Services (OUCS) provided technical expertise and advice on using Text Encoding Initiative markup; he also prepared the xml template and the DTD. Janice Carruthers designed the markup taxonomy used throughout the corpus for encoding speech and thought presentation, detachment, inversion and negation. She carried out all markup and checking of the corpus. This was funded by a Research Leave Award from the Arts and Humanities Research Council (AHRC).
The corpus is transcribed orthographically and, for the most part, without punctuation. A full-stop is used in the transcription where there is any need for clarity about the position of the end of a clause (this need can arise for syntactic or phonetic/supra-segmental reasons). In the default case of no full-stop between clauses, adverbials appearing to the left of verb phrases can be assumed to be initial adverbials occurring pre-verbally. A question mark is used to indicate a question, primarily because questions in oral French often take the same syntactic form as declaratives, the distinction being intonational rather than syntactic. The exceptions to orthographic transcription are proper nouns and names, where initial capital letters are used, as well as a small number of very specific contexts where a non-orthographic transcription is necessary to reflect the context in which a negative construction is found, i.e. ‘t’as pas’ for ‘tu as pas’, ‘j’sais pas’ for ‘je sais pas’, ‘y a pas’ where ‘il’ is not pronounced. The transcription is literal rather than normative, reflecting the nature of oral discourse, and therefore includes repetitions, ellipses, false starts etc. Errors relative to standard French will appear where they occur (e.g. ‘s’est mit’). Truncated words are transcribed by giving an approximation to the intial sounds followed by ‘-’, e.g. ‘str-’. In the case of proper names, an attempt was made to establish attested spellings and the spelling adopted was the one that best reflected the storyteller's pronunciation. In some cases it was necessary to make a small modification to an attested spelling in order to convey the pronunciation in French. Where it is not absolutely clear whether an item is articulated or not, the item is given in brackets; this is particularly common with ‘et’. Where there are two possibilities in a given context, either (i) the version considered most likely by the researcher appears with the other possibility in brackets after a forward slash ‘et (/est)’, or (ii) in the case of singular/plural agreement, the second possibility is given in brackets , e.g. ‘il(s) pense(nt)’. Where a semi-lexical form is used, it is transcribed where possible, e.g. ‘euh’, ‘mmm’. Less clear forms are given as part of a vocal ‘seg’ with an approximation to spelling if possible. Unclear sections are marked up using the ‘unclear’ TEI function and where possible, a reason is given and/or a best guess at content. Sung, incanted or whispered passages within the narrative are normally transcribed and marked up using the TEI ‘rend’ function. Songs at the beginning or end of narratives are not normally transcribed if they do not form part of the narrative itself; these and non-vocal events such as instrumental music are marked up using the TEI ‘event’ function. Pauses are not normally indicated because they are multiple and are normally short but where there is an exceptionally long pause, the TEI ‘pause’ tag is used.
The discourse is divided into utterances, each of which is encoded for a form of speech and thought presentation. The speaking subject in the vast majority of utterances is the narrator but a small number are uttered by the audience and encoded accordingly. Where the form of speech and thought presentation changes within an utterance, the embedded form is encoded within a ‘seg’. Three types of syntactic phenomena are marked up in ‘segs’ within utterances, i.e. detachment, inversion and negation. There follows an explanation of the rationale behind the markup for each phenomenon; the full taxonomy of codes is given below.
Speech and thought presentation (STP) is marked up for a number of core categories. Utterances where the narrator is recounting the story are marked up (NS), as are cases where the narrator provides material either at the beginning or end of the story which is not part of the story itself (NF) and cases where the narrator addresses the audience directly (NA). Forms of reported discourse that are marked up include direct discourse (DD), indirect discourse (ID), free direct discourse (FDD), cases which formally resemble free direct discourse but function as direct discourse (i.e. direct discourse where there are no speech verbs: fDD) and free indirect discourse (FID). Encoded examples of free indirect discourse normally contain at least one linguistic element that is clearly characteristic of free indirect discourse, such as deictics relating to the character’s ‘here and now’, subjective vocabulary or expressives that reflect the character’s rather than the narrator’s perspective (including questions), or intonation patterns that strongly suggest the character is the enunciator rather than the narrator. Utterances (or parts of utterances) that are ambiguous with respect to STP, i.e. cases where the discourse could be read as representing two possible STP categories, are marked up using portmanteau tags; for example, where it is not clear whether the narrator is recounting the events of the narrative or whether an utterance is the speech or thought of a character through free indirect discourse (NS-FID). All reports of discourse are marked up (i.e. verbs indicating speech or thought processes, normally those introducing or following reported discourse). The overarching principles adopted with respect to reported discourse are (i) that there has to be a segment of discourse that is identifiable as the reported discourse for it to be marked up for STP: structures containing an infinitive rather than a finite verb (e.g. il a décidé de partir) are not marked up, nor are speech or thought acts without the reported discourse clause (e.g. il a décrit la situation), representations of discourse without any segments of discourse or reported discourse (e.g. il a fait un discours excellent), or reporting devices (e.g. selon...); (ii) the ‘report of discourse’ must denote a speech or thought process in context: verbs of cognition or emotion that might be considered to be borderline cases of STP (e.g. savoir, vouloir, sentir, ressentir) are not normally included except in particular cases where there is a clear speech or thought process; (iii) negatives are inluded if the speech or thought process takes place, but otherwise not. In a small number of stories, there are complex examples of embedded narratives. These vary in form and in complexity of speech and thought patterns. In general, the principles adopted are (i) where the embedding occurs at the outset and the embedded narrative is the main one, then the embedded narrative is the one encoded for STP and the introductory material is encoded as NF; (ii) where the embedded narrative (or narratives) is quite substantial, but is not the main narrative, then the embedded narrative is ‘doubly’ encoded so that there is some flexibility around the statistical calculations. So, for example, some utterances in an embedded narrative could be encoded as both narrator recounting the story (NS) and as Direct Discourse (DD), and Direct Discourse within the embedded narrative will be encoded as ‘DD emb’; (iii) where the embedded section is very short, it is not encoded as a separate narrative.
Detached structures are encoded as ‘segs’ within utterances for a number of factors that research has shown to be relevant for an analysis of their usage in discourse. These are whether the detached element is to the right or left, whether it is a pronoun (and if so, which pronoun), lexical noun or other item, and the nature of the ‘replacement’ element within the main clause. There are also a number of possible relevant complicating factors which are encoded where they occur, i.e. cases where the replacement pronoun is not straightforwardly co-referential with the detached element, cases where there is no replacement pronoun but where there are ‘effets de co-référence’ (i.e. where there is a semantic or pragmatic link between the detached element and the main clause but not a clear syntactic one), cases of double and triple detachment, cases where there are two detached elements in apposition or repeated detached elements, cases where material is inserted between the detached element and the main clause and those where the detached clause is an interrogative. Examples are given in the detachment taxonomy below. [Note that cases of detached pronouns with ‘aussi’, e.g. ‘moi aussi’, are not marked up because of the distinctive semantic function of the combination of these two elements.]
Inversion of subject and verb is encoded in segs within utterances for factors that research on inversion has shown to be significant. The possible ‘trigger’ for the inversion is marked up, i.e. the syntactic or discourse context, or the nature of the element that precedes the inversion, or the presence of a syntactically 'heavy' subject. It is also noted if the inversion occurs at or near the beginning or end of the main narrative or of an embedded narrative.
Negative constructions are encoded as ‘segs’ within utterances for whether the ‘ne’ is retained or dropped, or whether the status of the ‘ne’ is ambiguous (i.e. where it is not possible to tell whether ‘ne’ is retained or not). They are further marked up for a number of factors that research has shown to be relevant for an analysis of whether or not the ‘ne’ is retained or dropped. These include the grammatical subject of the negative construction (noun, pronoun etc.), including cases where there is no surface subject (such as infinitives), and those where the negative is part of a relative clause (the type of relative is given). The negative particle involved is also marked up (pas, rien etc.) and the construction is further annotated for other relevant factors if they are attested, e.g. cases where the negative particle precedes the verb, intervocalic contexts for the ‘ne’ such as ‘tu as’, contexts of interrogation involving inversion, cases where the negative follows a subordinating conjunction or a relative, cases where a non-subject clitic is inserted before the verb, cases where other material is inserted before the verb, cases where the verb is one of the modals ‘devoir’ or ‘pouvoir’, cases where the negative clause is hypothetical and cases where the construction is a ‘frequently used expression’ such as 'je ne sais pas'. Negative elements and/or structures which are not relevant for an analysis of ‘ne’ retention or deletion (e.g. where ‘ne’ is attested with no negative particle such as ‘pas’, cases of expletive ‘ne’, examples where ‘ne’ is compulsory) are marked up separately so that they can be excluded from key statistics.
Argentan, salle municipale: large audience