Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

will NanoCaller support dorado basecaller with R10.4.1 flowcells? #48

Open
kerenzhou062 opened this issue Sep 30, 2024 · 4 comments
Open

Comments

@kerenzhou062
Copy link

kerenzhou062 commented Sep 30, 2024

The same question here. Given the situation that Guppy has been deprecated, will NanoCaller support dorado basecaller with R10.4.1 flowcells?

@kerenzhou062 kerenzhou062 changed the title The same question here. Given the situation that Guppy has been deprecated, will NanoCaller support dorado basecaller with R10.4.1 flowcells? will NanoCaller support dorado basecaller with R10.4.1 flowcells? Sep 30, 2024
@mbhall88
Copy link

I asked this a while ago on #42 and have had no answer so I assume the answer is no?

@kaichop
Copy link
Contributor

kaichop commented Oct 1, 2024

this issue is being addressed

@kerenzhou062
Copy link
Author

this issue is being addressed

That's great! What is the model name for R10.4.1 called by dorado?

@umahsn
Copy link
Collaborator

umahsn commented Oct 1, 2024

Hi, our internal testing shows that NanoCaller models perform extremely well with R10.4.1 dataset. This can be attributed to the fact that NanoCaller only relies on basecalls and not the underlying signal data, so more accurate basecalls will result in more accurate variant calling. Below is the evaluation on HG002 77X coverage R10.4.1 (4kHz) dataset from https://registry.opendata.aws/ont-open-data/. I will perform a more rigorous test and upload the results to replace the R9.4.1 case study. Given the high performance of provided models of NanoCaller on R10.4.1 datasets, we do not plan to train and release R10.4.1 specific models in the near future. This is something we will explore later, but it is not a high priority issue at the moment. Any new model release will likely accompany necessary upgrades to the core NanoCaller algorithm to reflect the advances made in this field over the years.

SNP Calling

HG002_GRCh38_alldifficultregions: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
97.243 620821.16 23579.00 620350.00 7129.84 0.9634 0.9886 0.9759
78.758 621584.22 24380.00 621114.00 6366.78 0.9622 0.9899 0.9759
50.025 622667.97 25713.00 622197.00 5283.03 0.9603 0.9916 0.9757
30.120 623307.00 26860.00 622834.00 4644.00 0.9587 0.9926 0.9753

HG002_GRCh38_AllTandemRepeatsandHomopolymers_slop5: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
44.167 180371.81 5336.00 180247.00 2529.19 0.9712 0.9862 0.9787
32.585 180547.88 5520.00 180422.00 2353.12 0.9703 0.9871 0.9787
50.038 180271.49 5275.00 180147.00 2629.51 0.9716 0.9856 0.9785
30.141 180586.00 5578.00 180460.00 2315.00 0.9700 0.9873 0.9786

HG002_GRCh38_easy_regions: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
36.943 2732681.79 5646.00 2731980.00 4482.21 0.9979 0.9984 0.9982
30.127 2733232.00 5718.00 2732530.00 3932.00 0.9979 0.9986 0.9982
50.017 2731628.88 5543.00 2730927.00 5535.12 0.9980 0.9980 0.9980
30.127 2733232.00 5718.00 2732530.00 3932.00 0.9979 0.9986 0.9982

HG002_GRCh38_lowmappabilityall: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
187.646 188278.86 9153.00 188248.00 4361.14 0.9536 0.9774 0.9654
165.632 189052.49 9982.00 189021.00 3587.51 0.9498 0.9814 0.9654
50.025 190743.86 12708.00 190711.00 1896.14 0.9375 0.9902 0.9631
30.136 190915.00 13231.00 190882.00 1725.00 0.9352 0.9910 0.9623

HG002_GRCh38_MHC: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
33.864 20103.00 145.00 20020.00 62.00 0.9928 0.9969 0.9949
33.864 20103.00 145.00 20020.00 62.00 0.9928 0.9969 0.9949
77.183 20101.49 145.00 20018.00 63.51 0.9928 0.9969 0.9948
33.864 20103.00 145.00 20020.00 62.00 0.9928 0.9969 0.9949

HG002_GRCh38_segdups: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
221.195 115078.13 12390.00 114967.00 5869.87 0.9027 0.9515 0.9265
210.735 115709.82 13120.00 115598.00 5238.18 0.8981 0.9567 0.9265
50.025 118907.82 20084.00 118793.00 2040.18 0.8554 0.9831 0.9148
30.120 119069.00 20915.00 118954.00 1879.00 0.8505 0.9845 0.9126

HG002_minus_homopolymer_repeats: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
33.363 2457466.90 21604.00 2458017.00 5333.10 0.9913 0.9978 0.9946
30.120 2457699.00 21749.00 2458249.00 5101.00 0.9912 0.9979 0.9946
50.032 2456173.81 20954.00 2456723.00 6626.19 0.9915 0.9973 0.9944
30.120 2457699.00 21749.00 2458249.00 5101.00 0.9912 0.9979 0.9946

whole_genome: snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
38.377 3355582.41 31978.00 3353686.00 9532.59 0.9906 0.9972 0.9939
30.120 3356539.00 32578.00 3354641.00 8576.00 0.9904 0.9975 0.9939
50.017 3354297.13 31256.00 3352402.00 10817.87 0.9908 0.9968 0.9938
30.120 3356539.00 32578.00 3354641.00 8576.00 0.9904 0.9975 0.9939

Indel Calling

HG002_GRCh38_alldifficultregions:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
24.140 227068.44 122792.00 223973.00 141592.56 0.6459 0.6159 0.6306
23.890 227341.83 123382.00 224246.00 141319.17 0.6451 0.6167 0.6306
12.500 241493.13 158210.00 238410.00 127167.87 0.6011 0.6551 0.6269
2.240 246593.00 175702.00 243494.00 122068.00 0.5809 0.6689 0.6218

HG002_GRCh38_AllTandemRepeatsandHomopolymers_slop5:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
21.420 203911.79 126347.00 201241.00 134969.21 0.6143 0.6017 0.6080
20.200 205374.95 129662.00 202703.00 133506.05 0.6099 0.6060 0.6080
12.500 215190.70 154912.00 212522.00 123690.30 0.5784 0.6350 0.6054
2.240 220288.00 172358.00 217600.00 118593.00 0.5580 0.6500 0.6005

HG002_GRCh38_easy_regions:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
43.220 143232.41 12188.00 142548.00 13698.59 0.9212 0.9127 0.9170
32.500 143547.11 12557.00 142852.00 13383.89 0.9192 0.9147 0.9170
12.500 143797.77 13523.00 143086.00 13133.23 0.9137 0.9163 0.9150
2.270 143835.00 13946.00 143119.00 13096.00 0.9112 0.9165 0.9139

HG002_GRCh38_lowmappabilityall:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
20.410 8644.79 1920.00 8527.00 1762.21 0.8162 0.8307 0.8234
16.200 8666.40 1951.00 8548.00 1740.60 0.8142 0.8327 0.8234
12.530 8675.24 1976.00 8557.00 1731.76 0.8124 0.8336 0.8229
4.200 8689.00 2019.00 8569.00 1718.00 0.8093 0.8349 0.8219

HG002_GRCh38_MHC:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
15.690 1184.00 448.00 1162.00 502.00 0.7217 0.7023 0.7119
15.690 1184.00 448.00 1162.00 502.00 0.7217 0.7023 0.7119
14.510 1184.00 449.00 1162.00 502.00 0.7213 0.7023 0.7116
5.230 1184.00 455.00 1162.00 502.00 0.7186 0.7023 0.7103

HG002_GRCh38_segdups:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
12.190 9045.21 2409.00 8854.00 1771.79 0.7861 0.8362 0.8104
12.190 9045.21 2409.00 8854.00 1771.79 0.7861 0.8362 0.8104
12.550 9040.28 2409.00 8849.00 1776.72 0.7860 0.8357 0.8101
2.430 9060.00 2457.00 8867.00 1757.00 0.7830 0.8376 0.8094

HG002_minus_homopolymer_repeats:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
90.440 172628.55 26464.00 171087.00 34679.45 0.8660 0.8327 0.8491
77.150 172817.34 26716.00 171266.00 34490.66 0.8651 0.8336 0.8491
12.500 173315.66 28452.00 171713.00 33992.34 0.8579 0.8360 0.8468
2.270 173340.00 28695.00 171732.00 33968.00 0.8568 0.8361 0.8464

whole_genome:non_snp performance

Quality_Cutoff TP_baseline FP TP_call FN Precision Recall F1
53.440 342319.44 77989.00 337879.00 183089.56 0.8125 0.6515 0.7232
53.440 342319.44 77989.00 337879.00 183089.56 0.8125 0.6515 0.7232
12.500 385157.90 165653.00 380708.00 140251.10 0.6968 0.7331 0.7145
2.240 390289.00 183385.00 385819.00 135120.00 0.6778 0.7428 0.7088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants