Deltacorpus 1.1
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
Online
academicJournal
Zugriff:
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
Titel: |
Deltacorpus 1.1
|
---|---|
Autor/in / Beteiligte Person: | Mareček, David ; Yu, Zhiwei ; Zeman, Daniel ; Žabokrtský, Zdeněk |
Link: | |
Veröffentlichung: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016 |
Medientyp: | academicJournal |
Schlagwort: |
|
Sonstiges: |
|