Language corpora of the border areas of Finland and Russia
(SILK: Suomen itäpuolisten lähialueiden kielikorpukset)

Kone Foundation, 2013–2015

The aim of this project is to compile and develop language corpora in the border areas of Finland and Russia, especially in the dialects of Karelian, Ingrian Finnish and the variety of Finnish spoken in the region of Petrozavodsk. The compilation entails constructing standard digital corpora to be made available for traditional linguistic research, and the development involves new innovations for corpus annotation.

The project is part of the Kone Foundation's Language Program whose aims are to promote multidisciplinary preservation and documentation of the Finnish language and its less-spoken cognate languages. The project encompasses the research disciplines of Finnish, Karelian, Estonian and Russian linguistics, translation studies, and language technology, and it utilizes multiple areas of expertise at the University of Eastern Finland. The project is conducted jointly with the State University of Petrozavodsk.

The project will, firstly, advance the accessibility and usability of the existing dialect recordings which have been compiled by the Institute for the Languages of Finland and at the UEF from the 1960s onwards, and, secondly, compile a new corpus of modern media texts in Karelian Finnish which has never been documented and archived. Thirdly, the project will benefit language technology research in developing tools for annotating, analyzing and modeling colloquial speech and dialects.