Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Musicbrainz 1st step: JSON conversion to RDF #221

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
dfdd52b
feat: first draft of conversion script, need to be tested
Yueqiao12Zhang Nov 8, 2024
d11cd7a
refactor: apply to musicbrainz, need to be tested
Yueqiao12Zhang Nov 8, 2024
c479d68
feat: new script similar to get_relations.py for json
Yueqiao12Zhang Nov 8, 2024
d412340
Create namespace_mapping.json
Yueqiao12Zhang Nov 8, 2024
313dc2c
mapping: empty pred_mapping.json
Yueqiao12Zhang Nov 8, 2024
6897b03
test: recording.jsonl
Yueqiao12Zhang Nov 8, 2024
760c285
refactor: optimize ignore ambiguous column and Namespace matching
Yueqiao12Zhang Nov 8, 2024
d054437
mapping: complete Namespace uri
Yueqiao12Zhang Nov 8, 2024
a0b7924
refactor: ignore empty values optimization
Yueqiao12Zhang Nov 8, 2024
2dca21b
refactor: optimize namespace binding
Yueqiao12Zhang Nov 8, 2024
4c42eda
test: first stage test file for MB
Yueqiao12Zhang Nov 8, 2024
a58f315
test: example mapping.json
Yueqiao12Zhang Nov 8, 2024
33dc1c8
feat: temporary script for filling the mapping with arbitrary values
Yueqiao12Zhang Nov 8, 2024
bbc792c
doc: daily log
Yueqiao12Zhang Nov 8, 2024
a643a66
refactor: restructure the folder
Yueqiao12Zhang Nov 8, 2024
09cd09f
refactor: customize for testing
Yueqiao12Zhang Nov 15, 2024
40caacf
test: one record for testing reconciliation
Yueqiao12Zhang Nov 15, 2024
ce7e631
refactor: update ID URI
Yueqiao12Zhang Nov 22, 2024
cdae2c0
doc: update log.md for 11/15
Yueqiao12Zhang Nov 22, 2024
91393f9
feat: merge.py for reconciliation
Yueqiao12Zhang Nov 22, 2024
15d2fd5
refactor: fix blank node bug
Yueqiao12Zhang Nov 22, 2024
6ca9fc7
Update log.md
Yueqiao12Zhang Nov 22, 2024
dd37724
feat: relocating files and outputting correct rdf
Yueqiao12Zhang Dec 13, 2024
1da723f
test: testing for merge rdf and reconciled csv
Yueqiao12Zhang Dec 13, 2024
f432b27
feat: correctly merges the reconciled CSV into the raw RDF
Yueqiao12Zhang Jan 10, 2025
4a41944
Update log.md
Yueqiao12Zhang Jan 10, 2025
cc307d6
feat: recognize URI
Yueqiao12Zhang Jan 13, 2025
789a265
test: expanded test data set
Yueqiao12Zhang Jan 13, 2025
fdf5439
test: new complete output on 10 records
Yueqiao12Zhang Jan 13, 2025
d3f7cff
Update log.md
Yueqiao12Zhang Jan 13, 2025
5e205de
refactor: renew input name
Yueqiao12Zhang Jan 13, 2025
d290afb
Update log.md
Yueqiao12Zhang Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update log.md
Yueqiao12Zhang committed Jan 13, 2025
commit d3f7cff373c1fbc59d52e870b852cdef16e96d32
11 changes: 10 additions & 1 deletion json_2rdf/log.md
Original file line number Diff line number Diff line change
@@ -57,4 +57,13 @@
**Difficulties Encountered**:
- There might be infinitely many predicates, making reconciliation extremely difficult. We can categorize them by using a single URI for a group of similar predicates.
- Merging blank nodes is difficult since the internal code for each blank node when reading the RDF is different every time. Tracing the reconciled CSV is necessary during the iteration of the raw RDF.
- Using a stack data structure to iterate the RDF structure to effectively trace the blank nodes.
- Using a stack data structure to iterate the RDF structure to effectively trace the blank nodes.

### 01-13-2025

**Reconciliation discussion**:
Countries and citizenships appears in some tags and genres for artist or recordings. Do we consider it to be the language, the culture, or the citizenship of the artist?
"Death", "Hate" and similar genres, do we consider the original meaning of them or should they be considered as special literature genres?
Which one to use for "Person"? Q5 or Q215627
"Artist" as musician (Q639669)?
"Work" as work (Q386724) or work (Q268378)?