Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhance](paimon) cache deletion vector in paimon native reader #46544

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

suxiaogang223
Copy link
Contributor

What problem does this PR solve?

Related PR: #18074

Problem Summary:

  1. FE will split the PaimonSplit into split. So a file can have several splits.
  2. BE will scan each split, read the deletion vector of this file (if deletion file exists).
  3. If 2 splits belongs to a same PaimonSplit, the deletion vector of this file will be read twice.

Release note

Use ShardedKVCache to cache these deletion vectors, the kv cache is belong to a scan node, so all paimon native reader belong to this scan node will share same kv cache.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 7, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32357 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f8e6b88f678c823e210d7e253cfbec6a0601427a, data reload: false

------ Round 1 ----------------------------------
q1	17577	6156	6037	6037
q2	2039	302	174	174
q3	10414	1257	708	708
q4	10204	873	425	425
q5	7511	2197	1973	1973
q6	198	181	145	145
q7	894	727	608	608
q8	9244	1353	1109	1109
q9	5212	4862	4935	4862
q10	6765	2284	1836	1836
q11	495	283	271	271
q12	348	364	211	211
q13	17769	3614	3004	3004
q14	234	228	211	211
q15	560	526	482	482
q16	637	618	582	582
q17	574	845	334	334
q18	7041	6430	6390	6390
q19	1261	953	540	540
q20	305	328	197	197
q21	2786	2157	1948	1948
q22	362	326	310	310
Total cold run time: 102430 ms
Total hot run time: 32357 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6240	6325	6241	6241
q2	237	324	234	234
q3	2240	2637	2317	2317
q4	1401	1880	1365	1365
q5	4399	4758	4735	4735
q6	189	176	143	143
q7	2108	1990	1842	1842
q8	2596	2794	2660	2660
q9	7339	7198	7190	7190
q10	3047	3361	2836	2836
q11	596	518	497	497
q12	660	750	617	617
q13	3460	3864	3289	3289
q14	323	302	263	263
q15	566	501	506	501
q16	648	678	673	673
q17	1205	1717	1244	1244
q18	7840	7486	7136	7136
q19	797	1107	1013	1013
q20	1938	1981	1840	1840
q21	5361	5212	4786	4786
q22	613	634	589	589
Total cold run time: 53803 ms
Total hot run time: 52011 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191244 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f8e6b88f678c823e210d7e253cfbec6a0601427a, data reload: false

query1	1005	375	368	368
query2	6531	2448	2431	2431
query3	6709	216	218	216
query4	33936	23851	23583	23583
query5	4358	627	452	452
query6	292	213	200	200
query7	4636	499	302	302
query8	303	247	238	238
query9	9497	2762	2736	2736
query10	474	327	257	257
query11	18219	15574	15211	15211
query12	158	109	107	107
query13	1666	535	436	436
query14	10431	7473	7193	7193
query15	219	204	194	194
query16	7759	587	462	462
query17	1625	750	579	579
query18	1718	389	319	319
query19	226	173	150	150
query20	119	104	117	104
query21	203	120	105	105
query22	4386	4471	4440	4440
query23	34210	33209	33185	33185
query24	6325	2260	2297	2260
query25	455	438	373	373
query26	788	274	153	153
query27	1998	449	344	344
query28	5144	2440	2415	2415
query29	588	541	404	404
query30	237	185	146	146
query31	978	921	828	828
query32	72	59	63	59
query33	493	349	285	285
query34	762	835	517	517
query35	787	823	736	736
query36	1001	1020	947	947
query37	116	98	71	71
query38	4145	4116	4164	4116
query39	1524	1489	1511	1489
query40	200	113	102	102
query41	49	44	51	44
query42	118	99	127	99
query43	507	533	501	501
query44	1281	797	800	797
query45	180	167	161	161
query46	852	1038	646	646
query47	1883	1877	1846	1846
query48	378	406	327	327
query49	716	478	398	398
query50	621	678	378	378
query51	7021	7096	6831	6831
query52	102	98	92	92
query53	221	258	184	184
query54	473	487	393	393
query55	79	78	78	78
query56	258	257	245	245
query57	1224	1164	1110	1110
query58	229	225	228	225
query59	2908	3150	3062	3062
query60	261	258	273	258
query61	108	108	109	108
query62	895	807	749	749
query63	228	194	192	192
query64	3428	984	632	632
query65	3249	3162	3221	3162
query66	868	408	312	312
query67	15975	15824	15731	15731
query68	7060	758	516	516
query69	477	285	261	261
query70	1221	1162	1068	1068
query71	446	274	249	249
query72	6177	3861	3903	3861
query73	675	750	365	365
query74	10215	9162	9135	9135
query75	4065	3180	2686	2686
query76	3691	1166	774	774
query77	771	402	276	276
query78	9970	10046	9567	9567
query79	3634	803	600	600
query80	734	520	434	434
query81	491	262	229	229
query82	597	147	120	120
query83	198	168	147	147
query84	289	90	74	74
query85	777	366	306	306
query86	359	320	299	299
query87	4421	4543	4496	4496
query88	4521	2231	2214	2214
query89	417	326	307	307
query90	1924	192	188	188
query91	137	136	116	116
query92	67	56	54	54
query93	1797	860	536	536
query94	663	392	294	294
query95	339	263	257	257
query96	489	616	282	282
query97	2944	2913	2804	2804
query98	236	201	208	201
query99	1691	1616	1454	1454
Total cold run time: 289898 ms
Total hot run time: 191244 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f8e6b88f678c823e210d7e253cfbec6a0601427a, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.07
query4	1.60	0.10	0.11
query5	0.41	0.43	0.41
query6	1.19	0.64	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.51	0.50
query10	0.56	0.58	0.57
query11	0.14	0.10	0.11
query12	0.13	0.10	0.11
query13	0.61	0.60	0.59
query14	2.74	2.73	2.81
query15	0.90	0.84	0.84
query16	0.40	0.38	0.39
query17	1.05	1.08	1.07
query18	0.23	0.21	0.21
query19	1.95	1.88	2.00
query20	0.02	0.01	0.01
query21	15.36	0.91	0.57
query22	0.75	0.92	0.88
query23	14.95	1.39	0.65
query24	3.09	0.48	2.02
query25	0.28	0.18	0.05
query26	0.32	0.16	0.15
query27	0.06	0.06	0.05
query28	13.22	1.52	1.05
query29	12.53	3.84	3.27
query30	0.26	0.09	0.07
query31	2.84	0.58	0.39
query32	3.23	0.54	0.47
query33	3.07	3.08	3.17
query34	16.69	5.14	4.43
query35	4.53	4.43	4.50
query36	0.84	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.02
query40	0.18	0.14	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 105.46 s
Total hot run time: 31.11 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.86% (10130/26068)
Line Coverage: 29.90% (85685/286599)
Region Coverage: 29.01% (43732/150754)
Branch Coverage: 25.54% (22318/87370)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f8e6b88f678c823e210d7e253cfbec6a0601427a_f8e6b88f678c823e210d7e253cfbec6a0601427a/report/index.html

@suxiaogang223 suxiaogang223 changed the title [enahance](paimon) cache deletion vector in paimon native reader [enhance](paimon) cache deletion vector in paimon native reader Jan 7, 2025
@suxiaogang223 suxiaogang223 marked this pull request as draft January 27, 2025 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants