Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleaning sam idx when using multiterators #1823

Closed
pd3 opened this issue Aug 15, 2024 · 1 comment
Closed

cleaning sam idx when using multiterators #1823

pd3 opened this issue Aug 15, 2024 · 1 comment
Assignees

Comments

@pd3
Copy link
Member

pd3 commented Aug 15, 2024

I encountered something what looks like a bug in cleaning structures after the use of multi-iterator. The question is, should I expect hts_close(fp) to auto-destroy the index created with sam_index_load(fp,fname)? If not, what is the recommended way?

Why I am asking:

Scenario 1) when I don't call hts_idx_destroy() explicitly, valgrind tells me it needs to be freed.

==849678== 16 bytes in 1 blocks are definitely lost in loss record 1 of 3
==849678==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==849678==    by 0x2243B9: index_load (sam.c:1654)
==849678==    by 0x2243B9: index_load (sam.c:1643)
==849678==    by 0x2243B9: sam_index_load (sam.c:1677)

Also it leaves behind the multi-iterator, although hts_itr_destroy() was called after hts_close()

==861675== 200 bytes in 1 blocks are still reachable in loss record 2 of 3
==861675==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==861675==    by 0x2A78A6: htscodecs_tls_alloc (utils.c:126)
==861675==    by 0x2A78FF: htscodecs_tls_calloc (utils.c:171)
==861675==    by 0x2A118F: rans_uncompress_O1 (rANS_static.c:607)
==861675==    by 0x274790: cram_uncompress_block (cram_io.c:1663)
==861675==    by 0x25BB31: cram_dependent_data_series (cram_decode.c:624)
==861675==    by 0x25C501: cram_decode_slice (cram_decode.c:2288)
==861675==    by 0x25F18B: cram_next_slice (cram_decode.c:3360)
==861675==    by 0x25F91A: cram_get_seq (cram_decode.c:3457)
==861675==    by 0x25F964: cram_get_bam_seq (cram_decode.c:3519)
==861675==    by 0x227F2B: cram_readrec (sam.c:1552)
==861675==    by 0x21132B: hts_itr_multi_next (hts.c:4486)

Scenario 2) when I try to clean after hts_close(fp) using hts_idx_destroy(idx), I get an error telling me it was already freed:

==857317== Invalid read of size 8
==857317==    at 0x26C8C4: cram_index_free (cram_index.c:376)
==857317==    by 0x20B524: hts_idx_destroy (hts.c:2643)
==857317==    by 0x1EBEDE: mpileup_destroy (mpileup.c:404)
==857317==    by 0x485EF9F: batch_profile_destroy (noise-profile.c:196)
==857317==    by 0x4860453: run (noise-profile.c:875)
==857317==    by 0x1EE259: main_plugin (vcfplugin.c:671)
==857317==    by 0x4ED2D8F: (below main) (libc_start_call_main.h:58)
==857317==  Address 0x5dd0508 is 34,936 bytes inside a block of size 35,352 free'd
==857317==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==857317==    by 0x27B21C: cram_close (cram_io.c:5678)
==857317==    by 0x20E71E: hts_close (hts.c:1652)
==857317==    by 0x1EBECC: mpileup_destroy (mpileup.c:403)
==857317==    by 0x485EF9F: batch_profile_destroy (noise-profile.c:196)
==857317==    by 0x4860453: run (noise-profile.c:875)
==857317==    by 0x1EE259: main_plugin (vcfplugin.c:671)
==857317==    by 0x4ED2D8F: (below main) (libc_start_call_main.h:58)
==857317==  Block was alloc'd at
==857317==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==857317==    by 0x27A8C5: cram_dopen (cram_io.c:5297)
==857317==    by 0x20F51D: hts_hopen (hts.c:1568)
==857317==    by 0x20F862: hts_open_format (hts.c:949)
==857317==    by 0x1EC2B3: mpileup_init (mpileup.c:584)
==857317==    by 0x485FF94: batch_profile_run1 (noise-profile.c:297)
==857317==    by 0x485FF94: batch_profile_run (noise-profile.c:562)
==857317==    by 0x485FF94: run (noise-profile.c:874)
==857317==    by 0x1EE259: main_plugin (vcfplugin.c:671)
==857317==    by 0x4ED2D8F: (below main) (libc_start_call_main.h:58)

Scenario 3) when I try to call hts_idx_destroy() before hts_close(), the index cleans. However, the multi-iterator does not:

==879091== 200 bytes in 1 blocks are still reachable in loss record 1 of 2
==879091==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==879091==    by 0x2A78C6: htscodecs_tls_alloc (utils.c:126)
==879091==    by 0x2A791F: htscodecs_tls_calloc (utils.c:171)
==879091==    by 0x2A11AF: rans_uncompress_O1 (rANS_static.c:607)
==879091==    by 0x2747B0: cram_uncompress_block (cram_io.c:1663)
==879091==    by 0x25BB51: cram_dependent_data_series (cram_decode.c:624)
==879091==    by 0x25C521: cram_decode_slice (cram_decode.c:2288)
==879091==    by 0x25F1AB: cram_next_slice (cram_decode.c:3360)
==879091==    by 0x25F93A: cram_get_seq (cram_decode.c:3457)
==879091==    by 0x25F984: cram_get_bam_seq (cram_decode.c:3519)
==879091==    by 0x227F4B: cram_readrec (sam.c:1552)
==879091==    by 0x21134B: hts_itr_multi_next (hts.c:4486)
==879091==
==879091== 1,572,864 bytes in 1 blocks are still reachable in loss record 2 of 2
==879091==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==879091==    by 0x2A786F: htscodecs_tls_alloc (utils.c:156)
==879091==    by 0x2A791F: htscodecs_tls_calloc (utils.c:171)
==879091==    by 0x2A11AF: rans_uncompress_O1 (rANS_static.c:607)
==879091==    by 0x2747B0: cram_uncompress_block (cram_io.c:1663)
==879091==    by 0x25BB51: cram_dependent_data_series (cram_decode.c:624)
==879091==    by 0x25C521: cram_decode_slice (cram_decode.c:2288)
==879091==    by 0x25F1AB: cram_next_slice (cram_decode.c:3360)
==879091==    by 0x25F93A: cram_get_seq (cram_decode.c:3457)
==879091==    by 0x25F984: cram_get_bam_seq (cram_decode.c:3519)
==879091==    by 0x227F4B: cram_readrec (sam.c:1552)
==879091==    by 0x21134B: hts_itr_multi_next (hts.c:4486)

This was observed using the following code
https://github.com/samtools/bcftools/blob/35e76121ef9e48fb05ab9d094d95afa6f229226a/plugins/noise-profile.c#L196
https://github.com/samtools/bcftools/blob/35e76121ef9e48fb05ab9d094d95afa6f229226a/mpileup2/mpileup.c#L404

@daviesrob daviesrob self-assigned this Aug 22, 2024
@daviesrob
Copy link
Member

You currently need to use scenario 3:

        if ( mplp->bam[i].idx ) hts_idx_destroy(mplp->bam[i].idx);
        sam_close(mplp->bam[i].fp);

The noise you're seeing from valgrind is just htscodecs' thread-local working space. You generally don't need to worry about "still reachable" blocks reported by valgrind - they're not normally counted as an error. In this case, if you were using threads they would be cleaned up on thread exit, but for a single-threaded program I think they get left until quite late, and evidently after valgrind has noticed them.

Scenario 2 (hts_idx_destroy() after sam_close()) probably ought to work, and does if you're reading BAM files. Currently it crashes for CRAM because the hts_cram_idx_t is left with a dangling pointer to the CRAM structure after calling sam_close(). Sorting this out may be a bit tricky, so currently I'd recommend using the code above to avoid the problem.

pd3 added a commit to samtools/bcftools that referenced this issue Sep 6, 2024
As suggested in
    samtools/htslib#1823 (comment)
first comes hts_idx_destroy() followed by sam_close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants