Skip to content

Commit

Permalink
Merge pull request #689 from macs3-project/feat/macs3/fragsupport
Browse files Browse the repository at this point in the history
Change the way to exclude regions in `hmmratac` and fix the incorrect `keep-duplicate` option
  • Loading branch information
taoliu authored Feb 13, 2025
2 parents f01cb68 + 5b08668 commit 4f451fb
Show file tree
Hide file tree
Showing 10 changed files with 445 additions and 273 deletions.
31 changes: 23 additions & 8 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
2024-11-30 Tao Liu <[email protected]>
MACS 3.0.3b1
2025-02-12 Tao Liu <[email protected]>
MACS 3.0.3

* Features added

Expand All @@ -19,20 +19,35 @@
source codes can be more compatible to Python programming tools
such as `flake8`. During rewritting, we cleaned the source codes
even more, and removed unnecessary dependencies during
compilation.
compilation. We will continue to do more code cleaning in the
future.

3) We changed the behavior on the usage of 'blacklist' regions in
`hmmratac`. We will remove the aligned fragments located in the
'blacklist' regions before the EM step to estimate fragment
lengths distributions and HMM step to learn and predict nucleosome
states. The reason is discussed in #680. To implement this
feature, we added the `exclude` functions to PETrackI and
PETrackII.

* Bug fixed

1) Fix issues in big-endian system in `Parser.py` codes. Enable
big-endian support in `BAM.py` codes for accessig certain
alignment records that overlap with givin genomic
coordinates using BAM/BAI files.
1) `hmmratac` option `--keep-duplicate` had opposite effect
previously as indicated by the name and description. It has been
renamed as `--remove-dup` to reflect the actual
behavior. `hmmratac` will not remove duplicated fragments unless
this option is set.

2) `hmmratac`: wrong class name used while saving digested signals
in BedGraph files. Multiple other issues related to output
filenames. #682

3) `predictd` and `filterdup`: wrong variable name used while
3) Fix issues in big-endian system in `Parser.py` codes. Enable
big-endian support in `BAM.py` codes for accessig certain
alignment records that overlap with givin genomic
coordinates using BAM/BAI files.

4) `predictd` and `filterdup`: wrong variable name used while
reading multiple pe/frag files.

* Doc
Expand Down
29 changes: 18 additions & 11 deletions MACS3/Commands/hmmratac_cmd.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Time-stamp: <2025-02-05 12:38:55 Tao Liu>
# Time-stamp: <2025-02-12 14:30:02 Tao Liu>

"""Description: Main HMMR command
Expand Down Expand Up @@ -118,9 +118,17 @@ def run(args):
# remember to finalize the petrack
petrack.finalize()

options.info(f"# Read {petrack.total} fragments.")

# filter duplicates if needed
if options.misc_keep_duplicates:
if options.misc_remove_duplicates:
options.info("# Removing duplicated fragments...")
# by default, we keep all duplicates
before_total = petrack.total
petrack.filter_dup(maxnum=1)
removed_n = before_total - petrack.total
options.info(f"# We removed {removed_n} duplicated fragments.")
options.info(f"# There are {petrack.total} fragments left.")

# read in blacklisted if option entered
if options.blacklist:
Expand All @@ -139,6 +147,14 @@ def run(args):
blacklist_regions = Regions()
blacklist_regions.init_from_PeakIO(blacklist)

# remove fragments aligned in the blacklisted regions

before_total = petrack.total
petrack.exclude(blacklist_regions)
removed_n = before_total - petrack.total
options.info(f"# We removed {removed_n} fragments overlapping with blacklisted regions.")
options.info(f"# There are {petrack.total} fragments left.")

#############################################
# 2. EM
#############################################
Expand Down Expand Up @@ -287,10 +303,6 @@ def run(args):
training_regions.expand(options.hmm_training_flanking)
training_regions.merge_overlap()

# remove peaks overlapping with blacklisted regions
if options.blacklist:
training_regions.exclude(blacklist_regions)
options.info(f"# after removing those overlapping with provided blacklisted regions, we have {training_regions.total} left")
if options.save_train:
fhd = open(training_region_bedfile, "w")
training_regions.write_to_bed(fhd)
Expand Down Expand Up @@ -409,11 +421,6 @@ def run(args):
candidate_regions.merge_overlap()
options.info(f"# after expanding and merging, we have {candidate_regions.total} candidate regions")

# remove peaks overlapping with blacklisted regions
if options.blacklist:
candidate_regions.exclude(blacklist_regions)
options.info(f"# after removing those overlapping with provided blacklisted regions, we have {candidate_regions.total} left")

# extract signals
options.info("# Extract signals in candidate regions and decode with HMM")
# we will do the extraction and prediction in a step of 10000 regions by default
Expand Down
Loading

0 comments on commit 4f451fb

Please sign in to comment.