Psycholinguistic Descriptives

dc.contributor.affiliationUniversity of Helsinki - Tatu Huovilainen
dc.contributor.authorTatu Huovilainen
dc.date.accessioned2025-04-29T12:54:59Z
dc.descriptionThe material is available at the Language Bank of Finland (Kielipankki) download service, access location http://urn.fi/urn:nbn:fi:lb-2018081602. This material comprises a dataset and a query tool for acquiring commonly used psycholinguistic descriptives for Finnish words. The dataset is based on six large corpora from sources such as magazines, newspapers, movie and tv-series subtitles, encyclopedia topics and Internet discussions. The material includes word surface form frequencies, lemma frequencies, syllable frequencies and letter n-gram frequencies. In addition the query tool can be used to acquire descriptives such as orthographic neighbors for lists of words. More information on the datasets and the query tool can be found in the readme file. Descriptives: Word lemma and surface forms tokens: 2500 million Unique lemmas: 0.7 million Unique surface forms: 1.5 million The corpora used: The Suomi24 Corpus: http://urn.fi/urn:nbn:fi:lb-2017021630 Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version (KLK, only from 1980 onwards): http://urn.fi/urn:nbn:fi:lb-2016050302 Finnish Magazines and Newspapers from the 1990s and 2000s, , Version 2: http://urn.fi/urn:nbn:fi:lb-2017091901 Finnish Wikipedia 2017: http://urn.fi/urn:nbn:fi:lb-2018060401 Finnish Opensubtitles 2017: http://urn.fi/urn:nbn:fi:lb-2018060403 Unpublished corpus source: Comments made to the Finnish discussions of the Reddit forum https://old.reddit.com/r/Suomi/ between January 2012 and December 2017 Change log: This description was replaced on December 12, 2018
dc.disciplineLanguages
dc.identifierhttp://urn.fi/urn:nbn:fi:lb-2018081601
dc.identifier.urihttps://datakatalogi.helsinki.fi/handle/123456789/520
dc.rightsOpen
dc.rights.licenseCreative Commons Attribution 4.0 International (CC BY 4.0)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titlePsycholinguistic Descriptives

Files