Lietuvių mokslo kalbos tekstynas

Aurelija Usonienė, Jonė Grigaliūnienė, Birutė Ryvitytė, Linas Būtėnas, Erika Jasionytė




The paper sets out to describe the initial stages of the design of the corpus of academic Lithuanian. Due to the increasing interest and numerous corpora-based studies in academic discourse (especially of academic English) all over the world, there is an obvious need to provide easily accessible electronic resources of academic Lithuanian to facilitate modern linguistic research, interdisciplinary studies, lexicographical practice, and terminology studies in theory and practice. The Corpus of Academic Lithuanian (CorALit) is being compiled at the University of Vilnius (Faculty of Philology and Faculty of Mathematics and Informatics). The building of the corpus is being carried out within the framework of the 2007—2013 national high-tech development programme launched by the Government of Lithuania and supervised by the Lithuanian State Science and Study Foundation ( The main issue in the process of corpus design is representativeness which is determined by the following factors: the number of research and study fields represented, the range of genres included (i.e. balance) and the way text chunks for each genre are selected (i.e. sampling).

The Corpus of Academic Lithuanian aims at representing the main fields of study and research developed in Lithuania and listed in Order No.30 of the Minister of Education and Science of 9 January 1998 “Concerning the Classification of Study and Research Areas, Fields and Branches” as well as the most typical genres that academic community uses for the creation, dissemination and evaluation of new knowledge and internal communication. Since at present there is no reliable scientific measure for corpus balance, the project team will have to rely on intuition and best estimates based on the experience of academic language corpora already compiled in other countries (the UK, USA, etc.). The compilation of the corpus also involves negotiations, sometimes rather time-consuming, with publishers and authors for copyright. Last but not least, technical aspects of corpus design are touched upon. The main purpose of corpus compilation is to make it easily accessible for large numbers of users, and this means changing the format of computer files and text coding in accordance with TEI P5 Guidelines. TEI P5 format will allow users to access the first synchronic corpus of written academic Lithuanian as a major resource of authentic language data via a simple internet search.

DOI: 10.15388/baltistica.43.1.1212

Visas tekstas: PDF

Creative Commons License
Svetainės turinį galima naudoti nekomerciniais tikslais, vadovaujantis CC-BY-NC-4.0 tarptautinės licencijos nuostatomis.