Elias Lönnrot's collection of Finnic oral poetry in 1848 when composing the Kalevala epic.

2025-02-03, 2025-02-03
dataset
Elias Lönnrot's collection of Finnic oral poetry in 1848 when composing the Kalevala epic A pipeline for creating a dataset containing the poems Elias Lönnrot, the composer of the Finnic Kalevala epic, presumably had access to in 1848 when creating Kalevala (1849). The pipeline is created as a fork of Maciej Janicki's work on the Filter-project, which enables calculation of poem-similarities and clusters in Finnic oral poetry. The resulting tool, also based on Janicki's work, is accessible online for searching and viewing poems, poem clusters, and poems with similar verses side by side. The corpus currently combines the following collections: Suomen Kansan Vanhat Runot, only including the poems collected before 1849. (Old Poems of the Finnish People; SKVR) The Finnish Literature Society's Corpus of unpublished poems (JR) only including the poems collected before 1849. A Corpus of literary poems (KR) based on public domain sources. Only those published before 1849 with these exceptions: Elias Lönnrot : Kalevala (1849) Elias Lönnrot : Suomen Kansan arwoituksia ynnä 189 Wiron arwoituksen kanssa (1851) Suomen Kansan Muinaisia Loitsurunoja toimittanut Elias Lönnrot (1880) Elias Lönnrot : Lisiä Vanhaan Kalevalaan (Niemi, A. R., 1895)   The aim of this project is to harness the tools created by Maciej Janicki as part of the FILTER-project to analyse a subset of the FILTER-project corpus, consisting of poems recorded before the publication of Kalevala in 1849. This tool enables comparison between poems with high similarity side-by-side, in order to reveal how Elias Lönnrot used, altered and merged variants of poems when compiling the Kalevala epic. The number of poems in this dataset is 21919, which equals to roughly 10% of the whole FILTER-corpus. The need for a computational tool to critically assess E. Lönnrots work arouse when Venla Sykäri, researcher at The Finnish Literature Society (SKS), began her work with conducting contemporary research on how the Kalevala epic came to exist. While SKS provides an online tool for viewing and sorting the collected poetry that became Kalevala, there was no tool view similarities and differences among the poetry with this particular data. As it happens, the Runoregi-toolcreated in the Filter-project does just that, and could with "minor" modifiactions use and present this custom dataset. For details on using the modified pipeline, database-creation, and the web-interface, see:  https://github.com/jakobytes/elias-1848. A demonstration of the interface and end result can currently be viewed here: https://elias-1848-a4b9ac0b37f7.herokuapp.com/   ### Origins of the source-material The earliest documented verses and poems of Finnic oral poetry are from the mid 16th century, and the majority of the poems collected before Kalevala was collected in the early 19th century. The oral poems were collected and written down by scholars and clergymen, the latter as cautionary examples of heresy, and the scholars out of curiosity and to find the _original source_ as was the contemporary custom at the time in linguistic studies. Once the collections began to accumulate, there became a need to index and maintain the collections, which, liberally simplifying, became the basis of the Finnish Literature Society. There are 197 collectors named before Kalevala, but the majority of the corpus before Lönnrot was collected by D.A.D Europaeus (1820-1884) and a handful of others. There is an upcoming article about the sources of Kalevala to be published soon by members of the FILTER-project (K. Kallio, E. Mäkelä and M. Janicki), that will give a much better, wider and elaborate description of the process. I will link to it once it has been published. Preliminary results suggest that the default similarity and clustering thresholds might be too tight, as the dataset is now less heterogenic. Lowering the threshold better reveals what we wish to see: Poems sharing multiple verses with Kalevala, but different variants. It can also be seen how some characters present in Kalevala are composites of many characters present in the source poems. We do know that from earlier research done by close-reading and expertise. This tool seems to make it easier to see what previously was only accessible to those with extensive knowledge of the poems. The jury's still out ont this though, and adjustments will surely be made once there's more feedback.   ### Findings One interesting approach I found while testing was looking at the coloured column left to the verses when looking at the opening poem, “Ensimmäinen runo”, of Kalevala. The colouring of the block next to the verse corresponds to density of similar verses in the corpus. It can be seen in this view which verses do not seem to have any similarity with other verses in the corpus. From that one might conclude that these verses have been created by Lönnrot for the 1848 Kalevala, as they do not appear in his earlier compilation of Kalevala (The “Wanha Kalevala” from 1835) or the source material. Also in the same view, in the top right, the opening poem only has a 20% similarity to the opening poem in “Wanha Kalevala”. By opening the comparison-view by clicking on “vkalevala01”, referencing to the old compilation, one can see both opening poems side-by-side aligning the similar sections. This does indeed show that Lönnrot has made significant modifications and rearrangements for the newer compilation we know as the Kalevala epic. While most of the verses, roughly 97%, has a direct correspondence in the source-material it becomes clear that the patchwork and timeline are created by Lönnrot, taking significant liberties in arranging the poems to what became the Finnish national epic. It’s also notable that the poem-sections might differ so much, that they aren’t categorised as similar by the poem-clustering-algorithm, and are hence not visible in the cluster-network at all. A cross-reference to the whole corpus of the Filter-project shows that the phenomenon, or lack of it, is replicated in both corpora, which rules out the possibility of a processing-error in this project. If there is an error it is inherited from earlier processing in the Filter-project. The interface currently lacks the ability to present similar poems in chronological order, which is a shortcoming that will be remedied in the future. It’s worth noting that a major part of the source material has been collected from Karelia, which from a modern perspective should be viewed as an own language and cultural identity. The critique on Kalevala being represented as only being Finnish is valid, and that has been acknowledged by the research community for a century. In defence of Lönnrot it must be said that in his time the sovereign nation of Finland did not exist, and all Finnic languages were viewed as dialects of one language, and in a Herderian sense, one folk. The question of cultural appropriation became relevant after his time, when Finland became independent, and separated, from Karelians. The symbol of Kalevala as a national epic is hence problematic, but not as much the work of Lönnrot. It’s said that the form of the epic, and how it was presented, was influenced by the independence movement in the later 19th century.