Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to fuzzer suite #290

Merged
merged 12 commits into from
Oct 15, 2024
Merged

Fix to fuzzer suite #290

merged 12 commits into from
Oct 15, 2024

Conversation

robomics
Copy link
Contributor

Well... it turns out that due to a typo (6c2a642) the tests covering .hic files were not working as intended.

Once the typo was fixed, it became clear that hic2cool.hic2ckkl_extractnorms (which is used as part of the script to generate matched datasets for fuzzing) is buggy.

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hic9 --resolution 50000 --balance SCALE --join --range chr1 --range2 chr1 | head

chr1	0	50000	chr1	0	50000	73337.7265625
chr1	0	50000	chr1	50000	100000	3788.1474609375
chr1	0	50000	chr1	150000	200000	15372.154296875
chr1	0	50000	chr1	250000	300000	863.2742919921875
chr1	0	50000	chr1	600000	650000	320.6240844726562
chr1	0	50000	chr1	800000	850000	181.2357482910156
chr1	0	50000	chr1	1200000	1250000	72.47455596923828
chr1	0	50000	chr1	3850000	3900000	49.82170867919922
chr1	0	50000	chr1	6400000	6450000	45.85749435424805
chr1	0	50000	chr1	10100000	10150000	55.68156051635742

user@dev:/tmp$ straw observed SCALE 4DNFIYECESRC.50000.hic9 chr1 chr1 BP 50000 | sort -V | head

0	0	73337.7265625
0	50000	3788.1474609375
0	150000	15372.154296875
0	250000	863.27429199219
0	600000	320.62408447266
0	800000	181.2357635498
0	1200000	72.474555969238
0	3850000	49.821708679199
0	6400000	45.857494354248
0	10100000	55.681560516357

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.cool --balance SCALE --join --range chr1 --range2 chr1 | head

chr1	0	50000	chr1	0	50000	73326.47095104467
chr1	0	50000	chr1	50000	100000	3787.671117122592
chr1	0	50000	chr1	150000	200000	15369.88791052104
chr1	0	50000	chr1	250000	300000	863.1946134691881
chr1	0	50000	chr1	600000	650000	320.5992394599785
chr1	0	50000	chr1	800000	850000	181.2220165658271
chr1	0	50000	chr1	1200000	1250000	72.46910509393064
chr1	0	50000	chr1	3850000	3900000	49.81796074382814
chr1	0	50000	chr1	6400000	6450000	45.8540617839616
chr1	0	50000	chr1	10100000	10150000	55.67740163217622

Indeed dumping the weights with hictk shows that the weights do not match:

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hic9 --resolution 50000 --table weights | head

GW_SCALE	GW_VC	INTER_SCALE	INTER_VC	SCALE	VC	VC_SQRT
0.026283109560608864	0.025766246020793915	0.05174751207232475	0.046523503959178925	0.01609581895172596	0.0024648411199450493	0.03344287723302841
0.09203214943408966	0.12651462852954865	0.20558251440525055	0.2271459698677063	0.049201834946870804	0.01302221417427063	0.07686905562877655
0.12427109479904175	0.18209722638130188	0.30621588230133057	0.3252060115337372	0.05639198422431946	0.019980482757091522	0.09521647542715073
0.1299455314874649	0.17877204716205597	0.26928457617759705	0.3199484348297119	0.06870701909065247	0.019129784777760506	0.09316744655370712
0.00203463202342391	0.004034347832202911	0.005303055513650179	0.007427865639328957	0.00011479311069706455	0.0002835658087860793	0.011343211866915226
0.1567896157503128	0.24953073263168335	0.3707886040210724	0.451296329498291	0.07196778059005737	0.02333964593708515	0.10290969163179398
nan	0	nan	0	nan	0	0
0.006109944544732571	0.006729166489094496	0.011837385594844818	0.012318640947341919	0.003740387037396431	0.0005235060816630721	0.015412391163408756
0.0015248165000230074	0.0027893735095858574	0.004137048963457346	0.00501304492354393	0.00026468493160791695	0.0002835658087860793	0.011343211866915226

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.cool --table weights | head

GW_KR	GW_SCALE	GW_VC	INTER_KR	INTER_SCALE	INTER_VC	KR	SCALE	VC	VC_SQRT	weight
nan	0.026283105302777974	0.02576624616980936	nan	0.05174750669380793	0.04652350389172373	0.016077572494580136	0.016097054281872813	0.002464841208774858	0.03344287517808673	nan
nan	0.09203213620927271	0.12651463256955936	nan	0.2055825168981054	0.22714596413889557	0.04944104865343332	0.049204246315782255	0.013022214173792832	0.07686905671137546	nan
nan	0.12427109352074223	0.18209723210528878	nan	0.3062158741143685	0.32520601701974294	0.05673619544218485	0.056392611603664815	0.019980482718918315	0.09521648015690835	nan
nan	0.1299455137404352	0.17877204681976597	nan	0.26928457985832055	0.319948433137104	0.068710662054193	0.06871187703332783	0.01912978531058009	0.09316744439070311	nan
nan	0.002034631888335365	0.004034348024141404	nan	0.005303055457039178	0.00742786560163526	nan	0.000114792916286769250.00028356580277940843	0.011343212348891936	nan
nan	0.15678961921691245	0.24953072896193357	nan	0.37078859850963075	0.4512963281586131	0.06307208106921801	0.0719689008334027	0.023339646844151306	0.1029096911932976	nan
nan	nan	0	nan	nan	0	nan	nan	0	0	nan
nan	0.006109943776999134	0.0067291664308921076	nan	0.011837385115403764	0.01231864130641568	nan	0.0037404472881887017	0.0005235060974389078	0.015412390603290858	nan
nan	0.0015248164032957328	0.0027893734385665177	nan	0.004137049070509011	0.005013045097399929	nan	0.0002646855687286078	0.00028356580277940843	0.011343212348891936	nan

However, converting the same file with hictk convert leads to the expected result:

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hictk.cool --balance SCALE --join --range chr1 --range2 chr1 | head

chr1	0	50000	chr1	0	50000	73337.72677796868
chr1	0	50000	chr1	50000	100000	3788.147462249335
chr1	0	50000	chr1	150000	200000	15372.15434075431
chr1	0	50000	chr1	250000	300000	863.2742998040474
chr1	0	50000	chr1	600000	650000	320.6240729395666
chr1	0	50000	chr1	800000	850000	181.2357592998512
chr1	0	50000	chr1	1200000	1250000	72.47455715122292
chr1	0	50000	chr1	3850000	3900000	49.82170744039169
chr1	0	50000	chr1	6400000	6450000	45.85749467000802
chr1	0	50000	chr1	10100000	10150000	55.68156241977521

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hictk.cool --resolution 50000 --table weights | head

GW_SCALE	GW_VC	INTER_SCALE	INTER_VC	SCALE	VC	VC_SQRT
0.026283109560608864	0.025766246020793915	0.05174751207232475	0.046523503959178925	0.01609581895172596	0.0024648411199450493	0.03344287723302841
0.09203214943408966	0.12651462852954865	0.20558251440525055	0.2271459698677063	0.049201834946870804	0.01302221417427063	0.07686905562877655
0.12427109479904175	0.18209722638130188	0.30621588230133057	0.3252060115337372	0.05639198422431946	0.019980482757091522	0.09521647542715073
0.1299455314874649	0.17877204716205597	0.26928457617759705	0.3199484348297119	0.06870701909065247	0.019129784777760506	0.09316744655370712
0.00203463202342391	0.004034347832202911	0.005303055513650179	0.007427865639328957	0.00011479311069706455	0.0002835658087860793	0.011343211866915226
0.1567896157503128	0.24953073263168335	0.3707886040210724	0.451296329498291	0.07196778059005737	0.02333964593708515	0.10290969163179398
nan	0	nan	0	nan	0	0
0.006109944544732571	0.006729166489094496	0.011837385594844818	0.012318640947341919	0.003740387037396431	0.0005235060816630721	0.015412391163408756
0.0015248165000230074	0.0027893735095858574	0.004137048963457346	0.00501304492354393	0.00026468493160791695	0.0002835658087860793	0.011343211866915226

So we need to regenerate most of the test datasets used for fuzzing... And I don't think there is a way to do that without involving hictk, which is not ideal.

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.33%. Comparing base (cda8f37) to head (364f7d4).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...tk/hic/include/hictk/hic/impl/file_reader_impl.hpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #290   +/-   ##
=======================================
  Coverage   79.33%   79.33%           
=======================================
  Files         173      173           
  Lines       17003    17003           
  Branches     2309     2309           
=======================================
+ Hits        13489    13490    +1     
+ Misses       2513     2510    -3     
- Partials     1001     1003    +2     
Flag Coverage Δ
[tests integration](https://app.codecov.io/gh/paulsengroup/hictk/pull/290/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paulsengroup) 69.78% <0.00%> (+0.01%)
[tests unittests](https://app.codecov.io/gh/paulsengroup/hictk/pull/290/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paulsengroup) 77.32% <0.00%> (-0.09%)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robomics
Copy link
Contributor Author

Plot twist!
It is not hic2cool's fault, but rather Juicer/HiCTool's fault...
Basically the SCALE biases computed by JuicerTools and HiCTools (at least by the latest versions available at the time of writing) are different.
hic2cool cannot copy weights from .hic9 files, and so the SCALE weights that end up in the .cool files generated by create_datasets_for_fuzzer.py are those from .hic8 files (which are generated using JuicerTools).
Thus, when the fuzzer compares the outputs produced by hictk and cooler when fetching SCALE-balanced interactions, the results are not consistent, as hictk is reading weights from .hic9 files (which are computed with HiCTools), while cooler is reading weights taken from .hic8 files (which are computed with JuicerTools).

robomics added a commit to paulsengroup/hictkpy that referenced this pull request Oct 15, 2024
@robomics robomics merged commit 12e6b4f into main Oct 15, 2024
124 of 125 checks passed
@robomics robomics deleted the fix/fuzzer branch October 15, 2024 20:53
robomics added a commit to paulsengroup/hictkpy that referenced this pull request Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant