Skip to content

Commit

Permalink
KP-7936 Update FIN-CLARIN-recommendation.xml (#1)
Browse files Browse the repository at this point in the history
* Added domains of recommendations in centre pages (close clarin-eric#240)

* Implemented multiple curators (clarin-eric#238)

* Updated the style of multiple curators and added an example (clarin-eric#238)

* Make base URL semi dynamic (close clarin-eric#248)

* Added domains of recommendations in format pages (clarin-eric#240)

* addresses clarin-eric#238

* KP-7936 Update FIN-CLARIN-recommendation.xml

We went through all functional domains and added formats as we saw relevant. We skipped domains we deemed not relevant to Kielipankki.

* add jussi + stub for <info>

* make "centre" optional (for now) in the header - addresses clarin-eric#247 , change the former "centre" to "centreID" - references clarin-eric#249

* change "centre" to "centreID" in the filter field - addresses clarin-eric#249

* Fixed centreID.

* Added centre elements in recommendation files (clarin-eric#247)

* Updated references from filter/centre to filter/centreID (clarin-eric#249)

* Updated centre model to use recommendation files instead of centres.xml
(clarin-eric#247)

* Added SAW recommendation (clarin-eric#247)

* add "centreID" as an optional child of "format", with an annotation stating its purpose; see clarin-eric#249 (comment)

* KP-7936 add PDF for documentation, add review date

* KP-7936 remove PDF* for textual src lang data

* KP-7936 add info text based on Språkbanken

* KP-7936 Update FIN-CLARIN-recommendation.xml

We went through all functional domains and added formats as we saw relevant. We skipped domains we deemed not relevant to Kielipankki.

* add jussi + stub for <info>

* KP-7936 add PDF for documentation, add review date

* KP-7936 remove PDF* for textual src lang data

* KP-7936 add info text based on Språkbanken

* KP-7936 fix indent

---------

Co-authored-by: margaretha <[email protected]>
Co-authored-by: piotr <piotr@bodysek>
  • Loading branch information
3 people authored Feb 15, 2024
1 parent c2df408 commit f1792b8
Showing 1 changed file with 118 additions and 19 deletions.
137 changes: 118 additions & 19 deletions SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,76 @@
<filter>
<centreID>FIN-CLARIN</centreID>
</filter>
<centre id="FIN-CLARIN" deposition="1">
<centre id="FIN-CLARIN" deposition="1">
<name>The Language Bank of Finland</name>
<a href="https://centres.clarin.eu/centre/17"/>
<nodeInfo>
<ri status="B-centre">CLARIN</ri>
</nodeInfo>
</centre>
</header>
<respStmt>
<curator>Jussi Piitulainen</curator>
<github>https://github.com/jpiitula</github>
<reviewDate>2024-02-15</reviewDate>
</respStmt>
</header>
<info xml:lang="en">
<p>The following measures are taken to enhance the chance of future interpretability of the
data.</p>
<p>The number of accepted file formats is small and well documented to make future conversions
to other formats more feasible. Open (non-proprietary) file formats
are strongly preferred.
The Language Bank of Finland recommends formats listed in the CLARIN
<a href="https://standards.clarin.eu/sis/">Standards Information System</a>.</p>
<p>The Language Bank's participation in relevant networks like
<a href="https://www.clarin.eu/content/overview-clarin-centres">CLARIN</a>
enables steady information about recent developments in file formats and encodings. Plans
to migrate or convert files will be developed if new standards arise.</p>
<p>For more information, see the <a href="https://www.kielipankki.fi/language-bank/">Language Bank of Finland's Portal</a>.</p>
<p>Data to be deposited might need to be converted to accepted or recommended formats for long-term preservation.</p>
<p>Plain text and XML files will normally only be accepted in Unicode character encoding, preferably UTF-8.</p>
<p>As a general guideline we believe that the file formats best suited for long-term
sustainability and accessibility:</p>
<ul>
<li>Are frequently used</li>
<li>Have open specifications</li>
<li>Are independent of specific software, developers or vendors</li>
</ul>
</info>
<formats>
<format id="fCHAT">
<domain>Audiovisual Annotation</domain>
<level>discouraged</level>
<comment>Consider using <formatRef ref="fTEISpoken"/> instead.</comment>
</format>
<format id="fCHAT-XML">
<domain>Audiovisual Annotation</domain>
<level>discouraged</level>
<comment>Consider using <formatRef ref="fTEISpoken"/> instead.</comment>
</format>
<format id="fDOCX">
<domain>Audiovisual Annotation</domain>
<level>discouraged</level>
<comment>Consider using <formatRef ref="fPDFA"/> instead.</comment>
</format>
<format id="fELAN">
<domain>Audiovisual Annotation</domain>
<level>acceptable</level>
</format>
<format id="fTEISpoken">
<domain>Audiovisual Annotation</domain>
<level>recommended</level>
<comment>See <a href="http://jtei.revues.org/142">format description</a>.</comment>
</format>
<format id="fWave">
<domain>Audiovisual Source Language Data</domain>
<level>recommended</level>
<comment>PCM-WAV, 48 kHz, 16 bit</comment>
</format>
<format id="fWave">
<domain>Audiovisual Source Language Data</domain>
<level>acceptable</level>
<comment>PCM-WAV above 22 kHz/16 bit</comment>
</format>
<format id="fCMDI">
<domain>Catalogue Metadata</domain>
Expand All @@ -30,22 +84,26 @@
<domain>Documentation</domain>
<level>recommended</level>
</format>
<format id="fTEI">
<format id="fPDF">
<domain>Documentation</domain>
<level>recommended</level>
<level>acceptable</level>
</format>
<format id="fXML">
<format id="fTEI">
<domain>Documentation</domain>
<level>recommended</level>
</format>
<format id="fJP2">
<domain>Image Source Language Data</domain>
<level>recommended</level>
<format id="fTextPlain">
<domain>Audiovisual Annotation</domain>
<level>discouraged</level>
</format>
<format id="fJPEG">
<domain>Image Source Language Data</domain>
<level>recommended</level>
</format>
<format id="fPraat">
<domain>Audiovisual Annotation</domain>
<level>acceptable</level>
</format>
<format id="fPNG">
<domain>Image Source Language Data</domain>
<level>recommended</level>
Expand All @@ -58,6 +116,10 @@
<domain>Lexical Resource</domain>
<level>recommended</level>
</format>
<format id="fTSV">
<domain>Lexical Resource</domain>
<level>recommended</level>
</format>
<format id="fLMF">
<domain>Lexical Resource</domain>
<level>recommended</level>
Expand All @@ -66,20 +128,16 @@
<domain>Text Annotation</domain>
<level>recommended</level>
</format>
<format id="fTEI">
<format id="fCWB-VRT">
<domain>Text Annotation</domain>
<level>recommended</level>
</format>
<format id="fXML">
<format id="fTEI">
<domain>Text Annotation</domain>
<level>recommended</level>
</format>
<format id="fELAN">
<domain>Textual Source Language Data</domain>
<level>recommended</level>
</format>
<format id="fPDFA">
<domain>Textual Source Language Data</domain>
<format id="fXML">
<domain>Text Annotation</domain>
<level>acceptable</level>
</format>
<format id="fPraat">
Expand All @@ -89,18 +147,59 @@
<format id="fTextPlain">
<domain>Textual Source Language Data</domain>
<level>recommended</level>
<comment>UTF-8 encoded</comment>
</format>
<format id="fMP4">
<domain>Audiovisual Source Language Data</domain>
<level>acceptable</level>
</format>
<format id="fFLAC">
<domain>Audiovisual Source Language Data</domain>
<level>acceptable</level>
</format>
<format id="fMP3">
<domain>Audiovisual Source Language Data</domain>
<level>discouraged</level>
<comment>lossy formats should be avoided if possible</comment>
</format>
<format id="fMPEG-4-AVC">
<domain>Audiovisual Source Language Data</domain>
<level>recommended</level>
<comment>25 fps, 1920×1080, constant bit rate </comment>
</format>
<format id="fGZIP">
<domain>Tool Support</domain>
<level>recommended</level>
<level>acceptable</level>
</format>
<format id="fTAR">
<domain>Tool Support</domain>
<level>recommended</level>
<level>acceptable</level>
</format>
<format id="fZIP">
<domain>Tool Support</domain>
<level>recommended</level>
</format>
<format id="fMarkdown">
<domain>Documentation</domain>
<level>recommended</level>
</format>
<format id="fTextPlain">
<domain>Documentation</domain>
<level>recommended</level>
<comment>e.g. as README.txt</comment>
</format>
<format id="fJSON">
<domain>Metadata</domain>
<level>acceptable</level>
<comment>regular and structured; consider using <formatRef ref="fJSONLD"/> with a schema</comment>
</format>
<format id="fCSV">
<domain>Metadata</domain>
<level>acceptable</level>
</format>
<format id="fALTO">
<domain>Text Annotation</domain>
<level>acceptable</level>
</format>
</formats>
</recommendation>
</recommendation>

0 comments on commit f1792b8

Please sign in to comment.