home edit page issue tracker

This page pertains to UD version 2.

Tokenization and Word Segmentation

TODO: what counts as a word in spoken treebanks?

@sylvainkahane:

Elements that are generally not part of the syntactic construction (they can be as [SILENT]) [LAUGH]… punct + PUNCT, Pause=Yes, NonVerbal=Yes (very important for searches) [INCOMPREHENSIBLE] : Do we want to indicate the approximative number of syllables? SyllablesNumber=X What relation? Indicate it if you can infer it, otherwise dep