Fix to fuzzer suite #290

robomics · 2024-10-15T14:28:12Z

Well... it turns out that due to a typo (6c2a642) the tests covering .hic files were not working as intended.

Once the typo was fixed, it became clear that hic2cool.hic2ckkl_extractnorms (which is used as part of the script to generate matched datasets for fuzzing) is buggy.

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hic9 --resolution 50000 --balance SCALE --join --range chr1 --range2 chr1 | head

chr1	0	50000	chr1	0	50000	73337.7265625
chr1	0	50000	chr1	50000	100000	3788.1474609375
chr1	0	50000	chr1	150000	200000	15372.154296875
chr1	0	50000	chr1	250000	300000	863.2742919921875
chr1	0	50000	chr1	600000	650000	320.6240844726562
chr1	0	50000	chr1	800000	850000	181.2357482910156
chr1	0	50000	chr1	1200000	1250000	72.47455596923828
chr1	0	50000	chr1	3850000	3900000	49.82170867919922
chr1	0	50000	chr1	6400000	6450000	45.85749435424805
chr1	0	50000	chr1	10100000	10150000	55.68156051635742

user@dev:/tmp$ straw observed SCALE 4DNFIYECESRC.50000.hic9 chr1 chr1 BP 50000 | sort -V | head

0	0	73337.7265625
0	50000	3788.1474609375
0	150000	15372.154296875
0	250000	863.27429199219
0	600000	320.62408447266
0	800000	181.2357635498
0	1200000	72.474555969238
0	3850000	49.821708679199
0	6400000	45.857494354248
0	10100000	55.681560516357

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.cool --balance SCALE --join --range chr1 --range2 chr1 | head

chr1	0	50000	chr1	0	50000	73326.47095104467
chr1	0	50000	chr1	50000	100000	3787.671117122592
chr1	0	50000	chr1	150000	200000	15369.88791052104
chr1	0	50000	chr1	250000	300000	863.1946134691881
chr1	0	50000	chr1	600000	650000	320.5992394599785
chr1	0	50000	chr1	800000	850000	181.2220165658271
chr1	0	50000	chr1	1200000	1250000	72.46910509393064
chr1	0	50000	chr1	3850000	3900000	49.81796074382814
chr1	0	50000	chr1	6400000	6450000	45.8540617839616
chr1	0	50000	chr1	10100000	10150000	55.67740163217622

Indeed dumping the weights with hictk shows that the weights do not match:

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hic9 --resolution 50000 --table weights | head

GW_SCALE	GW_VC	INTER_SCALE	INTER_VC	SCALE	VC	VC_SQRT
0.026283109560608864	0.025766246020793915	0.05174751207232475	0.046523503959178925	0.01609581895172596	0.0024648411199450493	0.03344287723302841
0.09203214943408966	0.12651462852954865	0.20558251440525055	0.2271459698677063	0.049201834946870804	0.01302221417427063	0.07686905562877655
0.12427109479904175	0.18209722638130188	0.30621588230133057	0.3252060115337372	0.05639198422431946	0.019980482757091522	0.09521647542715073
0.1299455314874649	0.17877204716205597	0.26928457617759705	0.3199484348297119	0.06870701909065247	0.019129784777760506	0.09316744655370712
0.00203463202342391	0.004034347832202911	0.005303055513650179	0.007427865639328957	0.00011479311069706455	0.0002835658087860793	0.011343211866915226
0.1567896157503128	0.24953073263168335	0.3707886040210724	0.451296329498291	0.07196778059005737	0.02333964593708515	0.10290969163179398
nan	0	nan	0	nan	0	0
0.006109944544732571	0.006729166489094496	0.011837385594844818	0.012318640947341919	0.003740387037396431	0.0005235060816630721	0.015412391163408756
0.0015248165000230074	0.0027893735095858574	0.004137048963457346	0.00501304492354393	0.00026468493160791695	0.0002835658087860793	0.011343211866915226

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.cool --table weights | head

GW_KR	GW_SCALE	GW_VC	INTER_KR	INTER_SCALE	INTER_VC	KR	SCALE	VC	VC_SQRT	weight
nan	0.026283105302777974	0.02576624616980936	nan	0.05174750669380793	0.04652350389172373	0.016077572494580136	0.016097054281872813	0.002464841208774858	0.03344287517808673	nan
nan	0.09203213620927271	0.12651463256955936	nan	0.2055825168981054	0.22714596413889557	0.04944104865343332	0.049204246315782255	0.013022214173792832	0.07686905671137546	nan
nan	0.12427109352074223	0.18209723210528878	nan	0.3062158741143685	0.32520601701974294	0.05673619544218485	0.056392611603664815	0.019980482718918315	0.09521648015690835	nan
nan	0.1299455137404352	0.17877204681976597	nan	0.26928457985832055	0.319948433137104	0.068710662054193	0.06871187703332783	0.01912978531058009	0.09316744439070311	nan
nan	0.002034631888335365	0.004034348024141404	nan	0.005303055457039178	0.00742786560163526	nan	0.000114792916286769250.00028356580277940843	0.011343212348891936	nan
nan	0.15678961921691245	0.24953072896193357	nan	0.37078859850963075	0.4512963281586131	0.06307208106921801	0.0719689008334027	0.023339646844151306	0.1029096911932976	nan
nan	nan	0	nan	nan	0	nan	nan	0	0	nan
nan	0.006109943776999134	0.0067291664308921076	nan	0.011837385115403764	0.01231864130641568	nan	0.0037404472881887017	0.0005235060974389078	0.015412390603290858	nan
nan	0.0015248164032957328	0.0027893734385665177	nan	0.004137049070509011	0.005013045097399929	nan	0.0002646855687286078	0.00028356580277940843	0.011343212348891936	nan

However, converting the same file with hictk convert leads to the expected result:

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hictk.cool --balance SCALE --join --range chr1 --range2 chr1 | head

chr1	0	50000	chr1	0	50000	73337.72677796868
chr1	0	50000	chr1	50000	100000	3788.147462249335
chr1	0	50000	chr1	150000	200000	15372.15434075431
chr1	0	50000	chr1	250000	300000	863.2742998040474
chr1	0	50000	chr1	600000	650000	320.6240729395666
chr1	0	50000	chr1	800000	850000	181.2357592998512
chr1	0	50000	chr1	1200000	1250000	72.47455715122292
chr1	0	50000	chr1	3850000	3900000	49.82170744039169
chr1	0	50000	chr1	6400000	6450000	45.85749467000802
chr1	0	50000	chr1	10100000	10150000	55.68156241977521

user@dev:/tmp$ hictk dump 4DNFIYECESRC.50000.hictk.cool --resolution 50000 --table weights | head

GW_SCALE	GW_VC	INTER_SCALE	INTER_VC	SCALE	VC	VC_SQRT
0.026283109560608864	0.025766246020793915	0.05174751207232475	0.046523503959178925	0.01609581895172596	0.0024648411199450493	0.03344287723302841
0.09203214943408966	0.12651462852954865	0.20558251440525055	0.2271459698677063	0.049201834946870804	0.01302221417427063	0.07686905562877655
0.12427109479904175	0.18209722638130188	0.30621588230133057	0.3252060115337372	0.05639198422431946	0.019980482757091522	0.09521647542715073
0.1299455314874649	0.17877204716205597	0.26928457617759705	0.3199484348297119	0.06870701909065247	0.019129784777760506	0.09316744655370712
0.00203463202342391	0.004034347832202911	0.005303055513650179	0.007427865639328957	0.00011479311069706455	0.0002835658087860793	0.011343211866915226
0.1567896157503128	0.24953073263168335	0.3707886040210724	0.451296329498291	0.07196778059005737	0.02333964593708515	0.10290969163179398
nan	0	nan	0	nan	0	0
0.006109944544732571	0.006729166489094496	0.011837385594844818	0.012318640947341919	0.003740387037396431	0.0005235060816630721	0.015412391163408756
0.0015248165000230074	0.0027893735095858574	0.004137048963457346	0.00501304492354393	0.00026468493160791695	0.0002835658087860793	0.011343211866915226

So we need to regenerate most of the test datasets used for fuzzing... And I don't think there is a way to do that without involving hictk, which is not ideal.

See #290

codecov · 2024-10-15T15:05:52Z

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.33%. Comparing base (cda8f37) to head (364f7d4).
Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
...tk/hic/include/hictk/hic/impl/file_reader_impl.hpp	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #290   +/-   ##
=======================================
  Coverage   79.33%   79.33%           
=======================================
  Files         173      173           
  Lines       17003    17003           
  Branches     2309     2309           
=======================================
+ Hits        13489    13490    +1     
+ Misses       2513     2510    -3     
- Partials     1001     1003    +2

Flag	Coverage Δ
[tests	integration](https://app.codecov.io/gh/paulsengroup/hictk/pull/290/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paulsengroup)	`69.78% <0.00%> (+0.01%)`
[tests	unittests](https://app.codecov.io/gh/paulsengroup/hictk/pull/290/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paulsengroup)	`77.32% <0.00%> (-0.09%)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

robomics · 2024-10-15T19:30:26Z

Plot twist!
It is not hic2cool's fault, but rather Juicer/HiCTool's fault...
Basically the SCALE biases computed by JuicerTools and HiCTools (at least by the latest versions available at the time of writing) are different.
hic2cool cannot copy weights from .hic9 files, and so the SCALE weights that end up in the .cool files generated by create_datasets_for_fuzzer.py are those from .hic8 files (which are generated using JuicerTools).
Thus, when the fuzzer compares the outputs produced by hictk and cooler when fetching SCALE-balanced interactions, the results are not consistent, as hictk is reading weights from .hic9 files (which are computed with HiCTools), while cooler is reading weights taken from .hic8 files (which are computed with JuicerTools).

See paulsengroup/hictk#290

robomics added 8 commits October 15, 2024 15:13

Fix typo...

6c2a642

Always default-initialize balancing::Weights::_type as UNKNOWN

456248c

Add --verbosity CLI option to hictk_fuzzer

8432596

Improve logging in hictk_fuzzer

db8b348

Fix bug discovered by hictk_fuzzer

6560543

Fix typos

5e70f12

Update create_datasets_for_fuzzer.py to workaround bug in hic2cool

4325c8c

See #290

Remove unused import

70cb856

robomics added 2 commits October 15, 2024 18:16

Add README for hictk_fuzzer

e7e2aa1

Bugfix [no ci]

f96f0e9

robomics mentioned this pull request Oct 15, 2024

Implement fuzzer paulsengroup/hictkpy#74

Merged

robomics added 2 commits October 15, 2024 21:23

Update create_datasets_for_fuzzer.py to use hic2cool instead of hictkpy

32a1360

Remove the problematic test case from the test suite

364f7d4

robomics added a commit to paulsengroup/hictkpy that referenced this pull request Oct 15, 2024

Remove the problematic test case from the test suite

88afc79

See paulsengroup/hictk#290

robomics merged commit 12e6b4f into main Oct 15, 2024
124 of 125 checks passed

robomics deleted the fix/fuzzer branch October 15, 2024 20:53

robomics added a commit to paulsengroup/hictkpy that referenced this pull request Oct 16, 2024

Remove the problematic test case from the test suite

dfc7dda

See paulsengroup/hictk#290

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to fuzzer suite #290

Fix to fuzzer suite #290

robomics commented Oct 15, 2024

codecov bot commented Oct 15, 2024 •

edited

Loading

robomics commented Oct 15, 2024

Fix to fuzzer suite #290

Fix to fuzzer suite #290

Conversation

robomics commented Oct 15, 2024

codecov bot commented Oct 15, 2024 • edited Loading

Codecov Report

robomics commented Oct 15, 2024

codecov bot commented Oct 15, 2024 •

edited

Loading