# **SUD_Skolt_Sami-Giellagas**: SUD treebank conversion for the corpus UD_Skolt_Sami-Giellagas version 2.17
This treebank was automatically generated from the UD treebank:
[UD_Skolt_Sami-Giellagas](https://github.com/UniversalDependencies/UD_Skolt_Sami-Giellagas/releases/tag/r2.17).

See [SUD data page](https://surfacesyntacticud.org/data/) for more details about the conversion process.

The rest of this file is a copy of the original README associated to **UD_Skolt_Sami-Giellagas** and therefore refers to UD.

---
---

# Summary

The UD Skolt Sami Giellagas treebank is based almost entirely on spoken Skolt Sami corpora.


# Introduction

UD Skolt Sami is the original annotation (CoNLL-U) for texts in the Skolt Sami language.
It originally consists of twenty translated sentences http://ilazki.thinkgeek.co.uk/brat/#/uralic/sms made by Hilkka Fofonoff from the
Finnish texts: [here](http://ilazki.thinkgeek.co.uk/brat/#/uralic/fin) with UD 1. dependencies.
Subsequent sentences come from the Giellagas Corpus of Spoken Saami Languages of the University of Oulu, Finland, which, in part,
include research materials transferred from (Kotimaisten kielten keskus) «Kotus»  'Institute for the Languages of Finland'.

Treebank sentences marked with text id beginning in [kotus-skak2010] originate from the publication Sääʹmǩiõll, äʹrbbǩiõll, for which the publisher 'Institute for the Languages of Finland' (Kotimaisten kielten keskus) has granted written permission to include in the treebank. Citation of the original publication should be included when the treebank is used (see References section below).

[https://github.com/rueter/erme-ud-skolt-sami](https://github.com/rueter/erme-ud-skolt-sami)



# Acknowledgments

The original annotations have been performed by Jack Rueter at the University of Helsinki and Markus Juutinen at the
Giellagas Institute (University of Oulu, Finland) using morphological tools developed with funding from a Kone Foundation
«Language Programme» funded project: «Skolt Sami Revitalization through Intelligent Computer-assisted Language Learning
means and the development of guidelines for transfering these methods to other threatened languages» (2015–2018) with
the linguistic consultation of Merja Fofonoff and Eino Koponen.
The tools used have been facilitated through the open-source Giella infrastructure at the Norwegian Arctic University in Tromsø.

Work with the Skolt Sami treebank builds upon previous experience with the UD_Erzya-JR treebank as well as growing discussions
with Francis Tyers, Tommi Pirinen, Jonathan Washington, Mika Hämäläinen and Niko Partanen. Without the Skolt Sami speakers and writers themselves,
however, we would be no where…



## References

* Markus Juutinen 2023: Koltansaamen kielikontaktit, Vähemmistökieli muuttuvassa kieliympäristössä. Oulun yliopiston tutukijakoulu; Oulun yliopisto, Humanistinen tiedekunta, Giellagas-Instituutti. 
* Eino Koponen, Jouni Moshnikoff & Satu Moshnikoff. 2010: Sääʹmǩiõll, äʹrbbǩiõll.Helsinki: (Kotimaisten kielten keskus) Institute for the Languages of Finland. Dommjânnmlaž ǩiõli tuʹtǩǩeemkõõskõs. Online publications of the Institute for the Languages of Finland, 14. ISSN 1796-041X. URL: http://scripta.kotus.fi/www/verkkojulkaisut/julk14/
* Pekka Sammallahti, Jouni Moshnikoff. 1991: Suomi-Koltansaame sanakirja / Lääʹdd-sääʹm sääʹnnǩeʹrjj [Finnish-Skolt Sámi Dictionary]. Girjegiisá Oy. Ohcejohka.
* Satu da Jouni Moshnikoff, Eino Koponen, Miika Lehtinen. 2020: Sääʹmǩiõl ǩiõllvueʹppes / Koltansaamen kielioppi. Sääʹmteʹǧǧ-Saamelaiskäräjät.

* (citation)


# Changelog

* 2025-10-30
  * Add VerbForm attributes for words with Mood
  * Connegatives illogically are tagged as VerbForm=Fin according to Northern Sami and Finnish TreeBank practices.
  * Add PronType feature attributes for DET and PRON.
* 2025-04-30
  * Introduce PartType=Int
  * Adjust use of :tmod to accusative that is not obj
  * Adjust use of :lmod
  * remove :eval, :mmod, :nec, :foc, :tense, :tcl, :deg
* 2024-11-01
  * Use nmod:poss with possessive pronouns
  * Degree with value Dim cleanup
* 2024-04-29
  * Continue adding
* 2023-10-29
  * Add CCC variant to train.conllu for minimal data split
* 2023-04-30
  * Continued annotation of stories from Sääʹmǩiõll, äʹrbbǩiõll.
  * Added some example sentences from Sääʹmǩiõl ǩiõllvueʹppes
  * Adjustments to diminutive, valency.
* 2022-10-31
  * Continued annotation of stories from Sääʹmǩiõll, äʹrbbǩiõll.
* 2022-04-29
  * Deprel correction and documentation
  * Trouble shooting in dependencies
  * Continued annotation of stories from Sääʹmǩiõll, äʹrbbǩiõll.
* 2021-10-31
  * Auxiliary, feature and deprel documentation
  * Continued annotation of stories from Sääʹmǩiõll, äʹrbbǩiõll.
* 2021-04-29
  * Auxiliary, feature and deprel documentation
  * Continued annotation of stories from Sääʹmǩiõll, äʹrbbǩiõll.
  * Language tag systematization.
* 2020-11-15 v2.7
  * Adding stories from Sääʹmǩiõll, äʹrbbǩiõll
* 2020-05-15 v2.6
  * Adding two stories from Sääʹmǩiõll, äʹrbbǩiõll: Kämmǥa Maainâs, Kååʹdd Maainâs
  * Expanding advmod:mmod, :lmod, :tmod and adding NameTypes.
* 2019-11-15 v2.5
  * Initial release in Universal Dependencies.


<pre>
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.5
License: CC BY-SA 4.0
Includes text: yes
Parallel: no
Genre: nonfiction news spoken
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Rueter, Jack; Juutinen, Markus; Tyers, Francis; Pirinen, Tommi A; Hämäläinen, Mika
Contributing: here
Contact: rueter.jack@gmail.com
===============================================================================
</pre>
