OcWikiAnnot: Annotated Wikipedia Corpus of Occitan

No Thumbnail Available

Restricted Availability

Date

2023-04-20, 2023-04-20

Persistent identifier of the Data Catalogue metadata

Creator/contributor

Editor

Journal title

Journal volume

Publisher

Publication Type

dataset

Peer Review Status

Repositories

Access rights

ISBN

ISSN

Description

OcWikiAnnot is a corpus of Wikipedia content in Occitan that is tokenized, PoS-tagged and lemmatized. The corpus contains 100 000 sentences for a total of 2 037 723 tokens. It is based on the Wikipedia corpus in Occitan that is part of the Leipzig Corpora Collection.    

Keyword (yso)

Publication Series

Journal title

Location of the original dataset