Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant entities generated for deletes #395

Closed
szarnyasg opened this issue Jun 23, 2022 · 2 comments · Fixed by #406
Closed

Redundant entities generated for deletes #395

szarnyasg opened this issue Jun 23, 2022 · 2 comments · Fixed by #406
Labels
Milestone

Comments

@szarnyasg
Copy link
Member

szarnyasg commented Jun 23, 2022

For BI data sets, there are 8 delete types as described in the spec.

However, if the data set is generated with the --explode-edges flag, extra files are generated:

$ rm -rf out-sf${SF}/
$ tools/run.py \
    --cores 4 \
    --memory 8G \
    ./target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar \
    -- \
    --format csv \
    --scale-factor ${SF} \
    --explode-edges \
    --mode bi \
    --output-dir out-sf${SF}/
$ ls -1 out-sf0.003/graphs/csv/bi/composite-projected-fk/deletes/dynamic
Comment
Comment_hasCreator_Person
Comment_isLocatedIn_Country
Comment_replyOf_Comment
Comment_replyOf_Post
Forum
Forum_containerOf_Post
Forum_hasMember_Person
Forum_hasModerator_Person
Person
Person_isLocatedIn_City
Person_knows_Person
Person_likes_Comment
Person_likes_Post
Post
Post_hasCreator_Person
Post_isLocatedIn_Country

The extra files are redundant and not used by the client code implementing the deletes:

$ cat out-sf0.003/graphs/csv/bi/composite-projected-fk/deletes/dynamic/Person/batch_id=2012-12-11/part-*.csv
deletionDate|id
2012-12-11T23:45:12.518+00:00|37383395344409
$ cat out-sf0.003/graphs/csv/bi/composite-projected-fk/deletes/dynamic/Person_isLocatedIn_City/batch_id=2012-12-11/part-*.csv
deletionDate|PersonId|CityId
2012-12-11T23:45:12.518+00:00|37383395344409|1177
@szarnyasg szarnyasg added the bug label Jun 25, 2022
@szarnyasg szarnyasg added this to the Milestone 4 milestone Jun 25, 2022
@dszakallas
Copy link
Member

dszakallas commented Jul 2, 2022

Could you tell me which ones are extra? Are inserts required for them?

@szarnyasg
Copy link
Member Author

szarnyasg commented Jul 2, 2022

Only these ones are needed, the rest are extra:

  - Comment.csv
  - Forum.csv
  - Person.csv
  - Post.csv
  - Forum_hasMember_Person.csv
  - Person_knows_Person.csv
  - Person_likes_Comment.csv
  - Person_likes_Post.csv

We have a separate issue for inserts: #402. There, the set of files is the same 8 files but the remaining dynamic files (e.g. Forum_hasTag_Tag) have to be joined & aggregated into them (as nested attributes). This is currently performed by a DuckDB SQL script but ultimately it should be implemented by Datagen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants