Why opt for TEI methodology ?
The TEI offers a form of extensible markup which is sufficiently flexible to be applied to a range of text types, including oral data, across different languages. The TEI tagset can be refined, enhanced and tailored to almost any individual project. As is the case with this corpus, the researcher can use a sub-set of the core TEI tagset and devise a new customised set of tags in line with TEI conventions. Projects on other phenomena (linguistic or otherwise) could develop their own tagsets within TEI and annotate the corpus for their own purposes. The TEI uses xml for all aspects of the encoding, including not only the main text itself, but also the Header which stores the metadata (analytic, editorial, descriptive and administrative). There are, of course, many different encoding systems available, but xml (or the related SGML or HTML)-based systems are increasingly dominant internationally and are most robust in terms of long-term preservation and possibilities for interoperability. As Leech puts it, 'the triumph of the more advanced SGML/HTML/XML style of encoding is in the long run assured' (2005: 23).