-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate data size concerns for adding a new index to solo_scores
#17
Comments
peppy
changed the title
Test data size concerns for adding a new index
Investigate data size concerns for adding a new index to Aug 31, 2023
solo_scores
Based on investigations above, the change in structure I'm looking to apply is: --- old.sql 2023-09-01 18:17:38
+++ new.sql 2023-09-01 18:18:47
@@ -8,9 +8,9 @@
`preserve` tinyint(1) NOT NULL DEFAULT '0',
`created_at` datetime NOT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
- PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
- KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
- KEY `beatmap_id` (`beatmap_id`)
+ PRIMARY KEY (`id`,`preserve`,`created_at`),
+ KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+ KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,created_at)
(PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
This is a first checkpoint, I still have some further smaller changes to test: TODO:
|
Updated with TODO changes applied: --- old.sql 2023-09-01 18:17:38
+++ new.sql 2023-09-01 20:03:04
@@ -6,12 +6,12 @@
`data` json NOT NULL,
`has_replay` tinyint(1) DEFAULT '0',
`preserve` tinyint(1) NOT NULL DEFAULT '0',
- `created_at` datetime NOT NULL,
- `updated_at` timestamp NULL DEFAULT NULL,
- PRIMARY KEY (`id`,`ruleset_id`,`preserve`,`created_at`),
- KEY `user_ruleset_id_index` (`user_id`,`ruleset_id`,`id` DESC),
- KEY `beatmap_id` (`beatmap_id`)
+ `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
+ `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+ PRIMARY KEY (`id`,`preserve`,`updated_at`),
+ KEY `user_ruleset_index` (`user_id`,`ruleset_id`),
+ KEY `beatmap_user_index` (`beatmap_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
-/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,created_at)
+/*!50500 PARTITION BY RANGE COLUMNS(`preserve`,`updated_at`)
(PARTITION p0catch VALUES LESS THAN (0,MAXVALUE) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (MAXVALUE,MAXVALUE) ENGINE = InnoDB) */ |
Bonus round: compression testing to ensure we have things fine tuned Compression testingTime values are from # key_size=16
# Time: 268.081s
-rw-r----- 1 dean admin 6069157888 Sep 4 12:35 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 3179282432 Sep 4 12:35 solo_scores_p#p#p1.ibd
# key_size=8
# Time: 195.931s
-rw-r----- 1 dean admin 3036676096 Sep 4 11:33 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 1598029824 Sep 4 11:33 solo_scores_p#p#p1.ibd
# key_size=4
# Time: 204.383s
-rw-r----- 1 dean admin 1606418432 Sep 4 12:20 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 872415232 Sep 4 12:20 solo_scores_p#p#p1.ibd
# key_size=2
# Time: 249.902s
-rw-r----- 1 dean admin 1602224128 Sep 4 12:24 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 1019215872 Sep 4 12:24 solo_scores_p#p#p1.ibd
# row_format=compact
# Time: 122.149s
-rw-r----- 1 dean admin 6069157888 Sep 4 12:35 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 3179282432 Sep 4 12:35 solo_scores_p#p#p1.ibd
# row_format=dynamic
# Time: 58.805s
-rw-r----- 1 dean admin 6480199680 Sep 4 12:42 solo_scores_p#p#p0catch.ibd
-rw-r----- 1 dean admin 3376414720 Sep 4 12:42 solo_scores_p#p#p1.ibd
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Database structure testing
As per #15, we need an index on
(user_id, ruleset_id, beatmap_id)
if using the database for operations like ppy/osu-queue-score-statistics#149 (and in the future, ranked score processing and probably more).I want to test on production data (scaled down slightly) to get a real-world idea of how adding/changing indices affect the size of this table.
Test whether changing primary key composition affects index size
TL;DR it doesn't seem to.
Test whether existing index needs
id DESC
at end (or whether it can use the implicity primary key)TL;DR it doesn't, it can use the primary key
Test how badly inserting
beatmap_id
into the existing index (KEY user_ruleset_id_index (user_id,ruleset_id,id DESC)
) breaks things (100% it will)TL;DR it does, as expected. We can get around this by changing the existing
KEY beatmap_id
to includeuser_id
at the end. This comes at almost zero storage cost due to the high cardinality of users per beatmap (almost completely unique). So we're just changing the ordering of the index rather than adding new overheads.Test whether including
ruleset_id
in the constructed partitions (as touched on in #16) means we can remove it from indicesTL;DR correct, but I'm avoiding this because of concerns for future extensibility (custom rulesets). Also there's no storage saving on disk.
Check why we have a
beatmap_id
indexapproved
state (osu-web
).Notes
LIST
partitioning, mysql can't determine it should be using specific partitions whenpreserve=
is not in the query:DATA_SIZE
by around 33%. But there's no documentation anywhere saying that this should be the case. What gives?(fixed by running
ALTER TABLE solo_scores FORCE
...)The text was updated successfully, but these errors were encountered: