Skip to content

Commit

Permalink
towards new model
Browse files Browse the repository at this point in the history
  • Loading branch information
xrotwang committed Nov 14, 2023
1 parent 480e307 commit 5409a27
Show file tree
Hide file tree
Showing 20 changed files with 1,703 additions and 1,537 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "raw/glottolog-cldf"]
path = raw/glottolog-cldf
url = https://github.com/glottolog/glottolog-cldf
11 changes: 10 additions & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"title": "CLDF dataset derived from 'Cross-cultural music corpus: The Expanded Natural History of Song Discography'",
"title": "D-PLACE dataset derived from Bertolo et al. 2023 'Cross-cultural music corpus: The Expanded Natural History of Song Discography'",
"access_right": "open",
"keywords": [
"cldf:StructureDataset",
Expand Down Expand Up @@ -32,6 +32,15 @@
{
"name": "Robert Forkel",
"type": "DataCurator"
},
{
"name": "Robert Forkel",
"type": "Editor"
}
],
"communities": [
{
"identifier": "dplace"
}
]
}
2 changes: 1 addition & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Mila Bertolo | | author
Martynas Snarskis | @msnarskis | author, DataCurator
Manvir Singh | | author
Samuel Mehr | | author
Robert Forkel | @xrotwang | DataCurator
Robert Forkel | @xrotwang | DataCurator, Editor
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CLDF dataset derived from 'Cross-cultural music corpus: The Expanded Natural History of Song Discography'
# D-PLACE dataset derived from Bertolo et al. 2023 'Cross-cultural music corpus: The Expanded Natural History of Song Discography'

## How to cite

Expand All @@ -17,6 +17,10 @@ This dataset is licensed under a CC-BY-4.0 license
Available online at https://doi.org/10.5281/zenodo.8223168




![](map.png)

### Coverage

![](map.png)
Expand All @@ -28,6 +32,8 @@ entity-relationship diagram below for how they relate.

![](erd.svg)



## CLDF Datasets

The following CLDF datasets are available in [cldf](cldf):
Expand Down
2 changes: 1 addition & 1 deletion RELEASING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ pytest
```

```shell
cldfbench cldfviz.map cldf --parameters song --pacific-centered --format png --width 20 --output map.png --with-ocean
cldfbench cldfviz.map cldf --parameters CCMC1 --pacific-centered --format png --width 20 --output map.png --with-ocean
```

```shell
Expand Down
77 changes: 53 additions & 24 deletions cldf/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<a name="ds-structuredatasetmetadatajson"> </a>

# StructureDataset CLDF dataset derived from 'Cross-cultural music corpus: The Expanded Natural History of Song Discography'
# StructureDataset D-PLACE dataset derived from Bertolo et al. 2023 'Cross-cultural music corpus: The Expanded Natural History of Song Discography'

**CLDF Metadata**: [StructureDataset-metadata.json](./StructureDataset-metadata.json)

Expand All @@ -13,13 +13,17 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://doi.org/10.5281/zenodo.8223168
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/D-PLACE/dplace-dataset-ccmc
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/D-PLACE/dplace-dataset-ccmc/tree/cd90206">D-PLACE/dplace-dataset-ccmc cd90206</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.8">Glottolog v4.8</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/D-PLACE/dplace-dataset-ccmc/tree/480e307">D-PLACE/dplace-dataset-ccmc 480e307</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.8">Glottolog v4.8</a></li><li><a href="https://github.com/glottolog/glottolog-cldf/tree/v4.8">glottolog/glottolog-cldf v4.8</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.10.12</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | ccmc
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | dplace-dataset-ccmc
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution


## <a name="table-valuescsv"></a>Table [values.csv](./values.csv)
## <a name="table-datacsv"></a>Table [data.csv](./data.csv)

Values are coded datapoints, i.e. measurements of a variable for a society.

**Note:** Missing data is signaled by an empty Value column.

property | value
--- | ---
Expand All @@ -31,16 +35,23 @@ property | value

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [parameters.csv::ID](#table-parameterscsv)
[Value](http://cldf.clld.org/v1.0/terms.rdf#value) | `string` |
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Soc_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [societies.csv::ID](#table-societiescsv)
[Var_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [variables.csv::ID](#table-variablescsv)
[Value](http://cldf.clld.org/v1.0/terms.rdf#value) | `string` | Values for categorical and ordinal variables reference the corresponding code via the Code_ID column. Values for continuous variables have the measured number in the Value column and an empty Code_ID.
[Code_ID](http://cldf.clld.org/v1.0/terms.rdf#codeReference) | `string` | References [codes.csv::ID](#table-codescsv)
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |
[Source](http://cldf.clld.org/v1.0/terms.rdf#source) | list of `string` (separated by `;`) |
`sub_case` | `string` | More specific description of the population the data refer to in terms of society or area.
`year` | `string`<br>Regex: `-?[0-9]{1,4}(-[0-9]{4})?` | Focal year, i.e. the time period to which the data refer.
`source_coded_data` | `string` | The source of the coded data, which was aggregated in this dataset.
`admin_comment` | `string` |
[Song_ID](http://cldf.clld.org/v1.0/terms.rdf#mediaReference) | `string` | References [media.csv::ID](#table-mediacsv)

## <a name="table-languagescsv"></a>Table [languages.csv](./languages.csv)
## <a name="table-societiescsv"></a>Table [societies.csv](./societies.csv)

We use the term “society” to refer to cultural groups. In most cases, a society can be understood to represent a group of people at a focal location with a shared language that differs from that of their neighbors. However, in some cases multiple societies share a language.
Note that a society's name and location in this dataset is taken from the corresponding language or dialect in Glottolog.

property | value
--- | ---
Expand All @@ -52,16 +63,27 @@ property | value

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Macroarea](http://cldf.clld.org/v1.0/terms.rdf#macroarea) | `string` |
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal` |
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal` |
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string` |
[ISO639P3code](http://cldf.clld.org/v1.0/terms.rdf#iso639P3code) | `string` |
`region` | `string` | indicates an approximate geographical location where the song was recorded, using Human Relations Area Files categories (see https://ehrafworldcultures.yale.edu)

## <a name="table-parameterscsv"></a>Table [parameters.csv](./parameters.csv)
[Latitude](http://cldf.clld.org/v1.0/terms.rdf#latitude) | `decimal`<br>&ge; -90<br>&le; 90 |
[Longitude](http://cldf.clld.org/v1.0/terms.rdf#longitude) | `decimal`<br>&ge; -180<br>&le; 180 |
[Glottocode](http://cldf.clld.org/v1.0/terms.rdf#glottocode) | `string`<br>Regex: `[a-z0-9]{4}[1-9][0-9]{3}` |
`Name_and_ID_in_source` | `string` | Society names identified as pejorative have been replaced with a preferred, English-language ethnonym. The name (and ID) as given in the source dataset is kept in this field.
`xd_id` | `string` | “cross-data-set” identifier, used to link societies present in different datasets, if they share a focal location. Note: If this field is empty, other fields such as Name, Glottocode, focal year and location may be used to identify societies across datasets if appropriate.
`alt_names_by_society` | list of `string` (separated by `; `) | A list of ‘alternate’ names for the society; includes, where available, one or more autonyms in the society’s own language, as well as other commonly encountered ethnonyms.
`main_focal_year` | `integer` | Focal year specifying the time period to which the data refer, given as number of years BCE - if negative - or CE.
`HRAF_name_ID` | `string` | Name(s) and ID(s) of the corresponding society in HRAF (the Human Relations Area Files)
`HRAF_ID` | `string` | ID of the corresponding society in HRAF
`origLat` | `decimal`<br>&ge; -90<br>&le; 90 | Uncorrected latitude as given in the source.
`origLong` | `decimal`<br>&ge; -270<br>&le; 180 | Uncorrected longitude as given in the source.
[comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |
`glottocode_comment` | `string` | Comment on the Glottocode assignment.
`region` | `string` | World Geographical Scheme for Recording Plant Distributions level2 region
`HRAF_region` | `string` | indicates an approximate geographical location where the song was recorded, using Human Relations Area Files categories (see https://ehrafworldcultures.yale.edu)

## <a name="table-variablescsv"></a>Table [variables.csv](./variables.csv)

Variables are cultural features or practices, or environmental descriptors.

property | value
--- | ---
Expand All @@ -73,14 +95,20 @@ property | value

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[A-Za-z.0-9_]+([0-9]+)?` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
[ColumnSpec](http://cldf.clld.org/v1.0/terms.rdf#columnSpec) | `json` |
`category` | list of `string` (separated by `, `) |
`type` | `string`<br>Valid choices:<br> `Continuous` `Categorical` `Ordinal` | Variables may be categorical (and then must be accompanied by a list of possible ‘codes’, i.e. rows in Codetable. Variables can also be continuous (e.g. Population size) or ordinal. Ordinal variables are accompanied by a list of codes (like categorical variables). The order of codes is encoded as `ord` column in CodeTable.
`unit` | `string` | The unit of measurement
`source_comment` | `string` | A note about the source of this variable.
`changes` | `string` | Notes about how a variable may have been derived from the source.
[comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |

## <a name="table-codescsv"></a>Table [codes.csv](./codes.csv)

The codes for the single parameter 'song' are the 10 categories, describing song type.
The codes for the single parameter 'CCMC1' are the 10 categories, describing song type.

property | value
--- | ---
Expand All @@ -92,10 +120,11 @@ property | value

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | The parameter or variable the code belongs to.<br>References [parameters.csv::ID](#table-parameterscsv)
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Var_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | The parameter or variable the code belongs to.<br>References [variables.csv::ID](#table-variablescsv)
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
`ord` | `integer` |

## <a name="table-mediacsv"></a>Table [media.csv](./media.csv)

Expand All @@ -111,10 +140,10 @@ property | value

Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string`<br>Regex: `[a-zA-Z0-9_\-]+` | Primary key
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
[Media_Type](http://cldf.clld.org/v1.0/terms.rdf#mediaType) | `string` |
[Media_Type](http://cldf.clld.org/v1.0/terms.rdf#mediaType) | `string`<br>Regex: `[^/]+/.+` |
[Download_URL](http://cldf.clld.org/v1.0/terms.rdf#downloadUrl) | `anyURI` |
[Path_In_Zip](http://cldf.clld.org/v1.0/terms.rdf#pathInZip) | `string` |

Loading

0 comments on commit 5409a27

Please sign in to comment.