In general, words are delimited by whitespace characters.UD for German Tokenization and Word Segmentation The Universal Dependency POS (UPOS) tags are converted according to the mapping provided by the Universal Dependency. Frequent examples of such words include hemmer ( haben + wir), häts ( hat + es), and sinz ( sind + sie), for we have, it has and they are. Like this, they can easily be found and, if needed, manually expanded. Therefore we decided to use the “head” of the word or the first word as tag and simply add a plus to show that this word incorporates another one Hollenstein and Aepli, 2014. However, in Swiss German these kind of merges are performed with any kind of words and just merging the tags would result in a big tagset. In the STTS there is one tag like this: the APPRART, used for combinations of articles and prepositions like im consisting of in + dem ( in the). TAG+ is used to handle merged words we introduced the “+“-sign which can be added to any PoS tag. Concerning dependencies it is treated as a marker introducing a finite clause subordinate to another clause ( mark) because they usually appear in um … zu ( in order to) constructions in German. ( They go shopping.) In the Standard German translation, Sie gehen einkaufen., we can see that there is no equivalent. It comes in the form of go, cho, goge, lo to name a few, as in Si gönd go poschte. PTKINF is an infinitive particle which does not exist in Standard German but is frequently used in dialects. Furthermore, dealing with Swiss German, there is the need for an additional POS tag PTKINF, not present in the STTS tagset, as well as for the “meta tag” TAG+. The POS annotations are generally based on the German guidelines, namely the Stuttgart-Tübingen-TagSet (STTS) and some changes according to the TIGER annotation scheme. we use the German tokenization and introduce a separate tag for merged words (see meta tag TAG+ described further down). However, there is a lot more freedom in merging any words together, which can’t usually be split in an easy way. Differences to German UD GuidelinesĪs for German, words are generally delimited by white spaces. Please check the readme/GitHub repo of the GSW treebank for further/current information. This introduction explains the most important differences which influence the annotation. ![]() This is a copy of the current German documentation UD for German, which we generally follow for Swiss German. Please consider enabling Javascript for this page to see the visualizations. ![]() It appears that you have Javascript disabled.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |