• The project “Lexica and Corpora for Speech-to-Speech Translation Components” (LC-STAR) aims to develop lexica for automatic speech recognition and text to speech synthesis for thirteen languages, and multilingual corpora for speech centered translation applications for nine languages. The project is led by a consortium comprising two universities and several industrial companies. All resources to be developed are encoded using the Extensible Markup Language (XML). This paper describes XML related issues in the LC-STAR project from three different perspectives; the XML encoding of the lexica, the XML encoding of the multilingual corpora and issues regarding the validation of XML encodings like that of the LC-STAR lexica.

