diff --git a/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml b/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml index 40cb47b6..c642aba4 100644 --- a/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml +++ b/SIS/clarin/data/recommendations/FIN-CLARIN-recommendation.xml @@ -5,22 +5,76 @@ FIN-CLARIN - + The Language Bank of Finland CLARIN - + + Jussi Piitulainen + https://github.com/jpiitula + 2024-02-15 + + + +

The following measures are taken to enhance the chance of future interpretability of the + data.

+

The number of accepted file formats is small and well documented to make future conversions + to other formats more feasible. Open (non-proprietary) file formats + are strongly preferred. + The Language Bank of Finland recommends formats listed in the CLARIN + Standards Information System.

+

The Language Bank's participation in relevant networks like + CLARIN + enables steady information about recent developments in file formats and encodings. Plans + to migrate or convert files will be developed if new standards arise.

+

For more information, see the Language Bank of Finland's Portal.

+

Data to be deposited might need to be converted to accepted or recommended formats for long-term preservation.

+

Plain text and XML files will normally only be accepted in Unicode character encoding, preferably UTF-8.

+

As a general guideline we believe that the file formats best suited for long-term + sustainability and accessibility:

+ + + + Audiovisual Annotation + discouraged + Consider using instead. + + + Audiovisual Annotation + discouraged + Consider using instead. + + + Audiovisual Annotation + discouraged + Consider using instead. + + + Audiovisual Annotation + acceptable + Audiovisual Annotation recommended + See format description. Audiovisual Source Language Data recommended + PCM-WAV, 48 kHz, 16 bit + + + Audiovisual Source Language Data + acceptable + PCM-WAV above 22 kHz/16 bit Catalogue Metadata @@ -30,22 +84,26 @@ Documentation recommended - + Documentation - recommended + acceptable - + Documentation recommended - - Image Source Language Data - recommended + + Audiovisual Annotation + discouraged Image Source Language Data recommended + + Audiovisual Annotation + acceptable + Image Source Language Data recommended @@ -58,6 +116,10 @@ Lexical Resource recommended + + Lexical Resource + recommended + Lexical Resource recommended @@ -66,20 +128,16 @@ Text Annotation recommended - + Text Annotation recommended - + Text Annotation recommended - - Textual Source Language Data - recommended - - - Textual Source Language Data + + Text Annotation acceptable @@ -89,18 +147,59 @@ Textual Source Language Data recommended + UTF-8 encoded + + + Audiovisual Source Language Data + acceptable + + + Audiovisual Source Language Data + acceptable + + + Audiovisual Source Language Data + discouraged + lossy formats should be avoided if possible + + + Audiovisual Source Language Data + recommended + 25 fps, 1920×1080, constant bit rate Tool Support - recommended + acceptable Tool Support - recommended + acceptable Tool Support recommended + + Documentation + recommended + + + Documentation + recommended + e.g. as README.txt + + + Metadata + acceptable + regular and structured; consider using with a schema + + + Metadata + acceptable + + + Text Annotation + acceptable + - \ No newline at end of file +