You can search the xml files for different types of encoded linguistic structures using the method below. You can also carry out simple lexical searches across the corpus.

Searches for encoded linguistic structures in the XML files

You can search for any of the encoded structures inside any type of utterance and for any combination of features in the encoding. It is strongly recommended that users opt for the TEIP5 versions of the files as these conform to the most recent TEI guidelines. If you prefer to use the TEIP4 versions, you need to download a DTD document and place it in the same folder as the TEIP4 xml file. Click here to download the DTD file.

The encoded structures in the corpus involve speech and thought presentation, detachment, inversion and negation. For full details of the annotation methodology, click here.

This search method uses 'XPath' which is the standard search tool within xml and therefore requires an xml editor. (One such editor is oXygen which can be downloaded here). You need to generate a formula to insert in the XPath search box in the top left-hand corner of the file you wish to search. The method below offers an easy way to do this. Note that the tagset for each type of segment will appear automatically once you click on the type you are searching for. The definition for each tag will also appear. Full details are given in the Header of each file. For a handy reference list of tags, click here.

Creating an XPath search

Follow the steps outlined below. You do not have to undertake all four steps; Step 2 and Step 3 are optional, depending on your search criteria.

Step 1

In the table below, first select the type of utterance you wish to interrogate in the discourse. If you wish to search all utterance types, click ALL. Otherwise select the utterance types you wish to search. You can select more than one type. (All utterance types listed exclude audience speech).

For example, if you wish only to analyse utterances that involve the narrator narrating the story itself (with, for example, no direct discourse, free direct discourse or sections where the narrator addresses the audience), select only NS. If you wish to include the narrator narrating the story, addressing the audience and recounting introductory or concluding material (but with no direct discourse or free direct discourse), select NS NA NF. If you want to analyse direct discourse only, select DD.

Note that the dominant types of speech and thought presentation in utterances are those involving the narrator's voice (NS, NA, NF) and forms of direct discourse involving characters' voices (DD, fDD). If you are searching for reports of discourse (RD), indirect discourse (ID), free indirect discourse (FID), free direct discourse (FDD) or ambiguous categories, search also under segments in Step 2 and under detail in Step 3, as these categories are often found in segments within utterances.

Step 2

Within the type(s) of utterance you have selected, select which type(s) of segment you would like to search for.

For example, if you would like to analyse detached constructions, choose 'det'; if you would like to analyse inversion, choose 'inv'; if you would like to analyse negation, choose 'neg'.

If you wish to analyse reports of discourse (RD), or segments of reported discourse such as indirect discourse (ID), free indirect discourse (FID), free direct discourse (FDD) or ambiguous categories such as those where it is difficult to tell whether the narrator or character is the enunciator (NS-FID), choose the option of speech and thought presentation (STP).

Step 3

In the 'detail' column, you will be offered a menu of tags available in the tagset for your chosen segment type.

You can search for a set of tags occurring together (choose 'combine details with and ') or for any occurrences of one or more tags (choose 'combine details with or '). So, for example, if you would like to view all examples of left-detachment, choose 'combine details with or ') and select all eight types of left-detached element at the top of the tag list. If you would like to view only left-detached pronouns which are replaced within the clause by 'il', select 'combine details with and ', plus the tags 'lpn' and 'cp3si'. Both the 'or ' and 'and' searches will return examples that contain other tags in addition to those searched for.

Step 4

Steps 1-3 will have generated an XPath search for you. Open the xml file you wish to search and copy and paste the XPath search into the XPath box in the top left of the file. Click 'return' and, if you are offered XPath builder, click 'yes' followed by 'execute'.

Worked Examples

  • You want to search in the narrator's speech for all examples of left-detached pronouns that are replaced by 'je'.
    Utterance = NS, NA, NF; Segment = det; Detail = combine with 'and ', lpn, p1ss
  • You want to search in the characters' direct discourse (with and without verbs of speech) for all cases of right detachment.
    Utterance = DD, fDD; Segment = det; Detail = combine with 'or ', rlex, rpn, rnm, radj, radv, rdmp, roth
  • You want to search for all examples of negatives in the narrator's speech where the 'ne' or 'n' is omitted and where the negative particle is 'pas' and where the verb is the modal 'pouvoir' or 'devoir'. Two searches are necessary to allow for examples of omission of 'ne' and those of omission of 'n' :
    1. Utterance = NS, NA, NF; Segment = neg; Detail = combine with 'and ', zne, pas, mod
    2. Utterance = NS, NA, NF; Segment = neg; Detail = combine with 'and ', zn, pas, mod
  • You want to search for examples of subject-verb inversion in the narrator's speech occurring at or near the end of a narrative or an embedded narrative.
    Utterance = NS, NA, NF; Segment = inv; Detail = combine elements with 'and ', end
  • You want to view all examples of free indirect speech (only unambiguous cases), search under both 'Utterance' and 'Segment':
    1. Utterance = FID
    2. Utterance = NS, NA, NF; Segment = STP; Detail = FID

XPath Generator





Combine details with: