Tokenization for Occitan (Gascon and Lengadocian)
dc.contributor.affiliation | University of Helsinki-Miletic, Aleksandra | |
dc.contributor.author | Miletic, Aleksandra | |
dc.date.accessioned | 2025-04-29T14:02:34Z | |
dc.date.issued | 2024-06-24 | |
dc.date.issued | 2024-06-24 | |
dc.description | A python programme to tokenise texts in Occitan based on rules. To launch the programme, execute the following instruction: python3 tokenizer_occitan.py < input.txt > output.conllu The script takes as input a text file with a single sentence per line, starting by a sentence ID, followed by a tab character, followed by the sentence itself. The current version of the tool was developped during the projects DIVITAL (funded by the ANR) and CorCoDial (funded by the Academy of Finland). | |
dc.identifier | https://doi.org/10.5281/zenodo.12515136 | |
dc.identifier.uri | https://datakatalogi.helsinki.fi/handle/123456789/5211 | |
dc.rights.license | cc-by-4.0 | |
dc.subject | tokenization | |
dc.subject | occitan | |
dc.title | Tokenization for Occitan (Gascon and Lengadocian) | |
dc.type | software |