From f1792b82c36aa7632083ff91179330ea69df52b8 Mon Sep 17 00:00:00 2001 From: Martin Matthiesen Date: Thu, 15 Feb 2024 14:28:40 +0200 Subject: [PATCH] KP-7936 Update FIN-CLARIN-recommendation.xml (#1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Added domains of recommendations in centre pages (close #240) * Implemented multiple curators (#238) * Updated the style of multiple curators and added an example (#238) * Make base URL semi dynamic (close #248) * Added domains of recommendations in format pages (#240) * addresses #238 * KP-7936 Update FIN-CLARIN-recommendation.xml We went through all functional domains and added formats as we saw relevant. We skipped domains we deemed not relevant to Kielipankki. * add jussi + stub for * make "centre" optional (for now) in the header - addresses #247 , change the former "centre" to "centreID" - references #249 * change "centre" to "centreID" in the filter field - addresses #249 * Fixed centreID. * Added centre elements in recommendation files (#247) * Updated references from filter/centre to filter/centreID (#249) * Updated centre model to use recommendation files instead of centres.xml (#247) * Added SAW recommendation (#247) * add "centreID" as an optional child of "format", with an annotation stating its purpose; see https://github.com/clarin-eric/standards/issues/249#issuecomment-1866315102 * KP-7936 add PDF for documentation, add review date * KP-7936 remove PDF* for textual src lang data * KP-7936 add info text based on Språkbanken * KP-7936 Update FIN-CLARIN-recommendation.xml We went through all functional domains and added formats as we saw relevant. We skipped domains we deemed not relevant to Kielipankki. * add jussi + stub for * KP-7936 add PDF for documentation, add review date * KP-7936 remove PDF* for textual src lang data * KP-7936 add info text based on Språkbanken * KP-7936 fix indent --------- Co-authored-by: margaretha Co-authored-by: piotr --- .../FIN-CLARIN-recommendation.xml | 137 +++++++++++++++--- 1 file changed, 118 insertions(+), 19 deletions(-) diff --git a/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml b/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml index 40cb47b6..c642aba4 100644 --- a/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml +++ b/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml @@ -5,22 +5,76 @@ FIN-CLARIN - + The Language Bank of Finland CLARIN - + + Jussi Piitulainen + https://github.com/jpiitula + 2024-02-15 + + + +

The following measures are taken to enhance the chance of future interpretability of the + data.

+

The number of accepted file formats is small and well documented to make future conversions + to other formats more feasible. Open (non-proprietary) file formats + are strongly preferred. + The Language Bank of Finland recommends formats listed in the CLARIN + Standards Information System.

+

The Language Bank's participation in relevant networks like + CLARIN + enables steady information about recent developments in file formats and encodings. Plans + to migrate or convert files will be developed if new standards arise.

+

For more information, see the Language Bank of Finland's Portal.

+

Data to be deposited might need to be converted to accepted or recommended formats for long-term preservation.

+

Plain text and XML files will normally only be accepted in Unicode character encoding, preferably UTF-8.

+

As a general guideline we believe that the file formats best suited for long-term + sustainability and accessibility:

+
    +
  • Are frequently used
  • +
  • Have open specifications
  • +
  • Are independent of specific software, developers or vendors
  • +
+
+ + Audiovisual Annotation + discouraged + Consider using instead. + + + Audiovisual Annotation + discouraged + Consider using instead. + + + Audiovisual Annotation + discouraged + Consider using instead. + + + Audiovisual Annotation + acceptable + Audiovisual Annotation recommended + See format description. Audiovisual Source Language Data recommended + PCM-WAV, 48 kHz, 16 bit + + + Audiovisual Source Language Data + acceptable + PCM-WAV above 22 kHz/16 bit Catalogue Metadata @@ -30,22 +84,26 @@ Documentation recommended - + Documentation - recommended + acceptable - + Documentation recommended - - Image Source Language Data - recommended + + Audiovisual Annotation + discouraged Image Source Language Data recommended + + Audiovisual Annotation + acceptable + Image Source Language Data recommended @@ -58,6 +116,10 @@ Lexical Resource recommended + + Lexical Resource + recommended + Lexical Resource recommended @@ -66,20 +128,16 @@ Text Annotation recommended - + Text Annotation recommended - + Text Annotation recommended - - Textual Source Language Data - recommended - - - Textual Source Language Data + + Text Annotation acceptable @@ -89,18 +147,59 @@ Textual Source Language Data recommended + UTF-8 encoded + + + Audiovisual Source Language Data + acceptable + + + Audiovisual Source Language Data + acceptable + + + Audiovisual Source Language Data + discouraged + lossy formats should be avoided if possible + + + Audiovisual Source Language Data + recommended + 25 fps, 1920×1080, constant bit rate Tool Support - recommended + acceptable Tool Support - recommended + acceptable Tool Support recommended + + Documentation + recommended + + + Documentation + recommended + e.g. as README.txt + + + Metadata + acceptable + regular and structured; consider using with a schema + + + Metadata + acceptable + + + Text Annotation + acceptable + - \ No newline at end of file +