Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FRAG format support and bdg filename type fixed #685

Merged
merged 15 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-and-test-MACS3-macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
matrix:
os: [macos-13]
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
runs-on: ${{ matrix.os }}
name: Build on ${{ matrix.os }} with Python ${{ matrix.python-version }}

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-and-test-MACS3-x64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
arch: ['x64']
name: Build on x64 with Python ${{ matrix.python-version }}
steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-anaconda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']
python-version: ['3.12']
arch: ['x64']
name: Build conda package in x64 under Python ${{ matrix.python-version }}
steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-gh-with-sphinx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11' # Set up the Python version you need
python-version: '3.12' # Set up the Python version you need

- name: Install dependencies
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
python-version: '3.12'

# Here we build sdist in this job. If we plan to build wheels
# and upload to PyPI, we need another job to build wheels, and
Expand Down
14 changes: 11 additions & 3 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
2024-10-08 Tao Liu <[email protected]>
MACS 3.0.3b
2024-11-30 Tao Liu <[email protected]>
MACS 3.0.3b1

* Features added

Expand All @@ -10,7 +10,8 @@
including the barcodes and counts information. In the `PETrackII`
class, we are able to extract a subset using a list of barcodes,
which enables us to call peaks only on a pool (pseudo-bulk) of
cells.
cells. Functions such as `callpeak`, `pileup` and `hmmratac` will
support this FRAG format in the future.

2) We extensively rewrote the `pyx` codes into `py` codes. In
another words, we now apply the 'pure python style' with PEP-484
Expand All @@ -27,6 +28,13 @@
alignment records that overlap with givin genomic
coordinates using BAM/BAI files.

2) `hmmratac`: wrong class name used while saving digested signals
in BedGraph files. Multiple other issues related to output
filenames. #682

3) `predictd` and `filterdup`: wrong variable name used while
reading multiple pe/frag files.

* Doc

1) Explanation on the filtering criteria on SAM/BAM/BAMPE files.
Expand Down
4 changes: 2 additions & 2 deletions MACS3/Commands/callpeak_cmd.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Time-stamp: <2024-10-02 15:58:36 Tao Liu>
# Time-stamp: <2024-11-29 21:43:31 Tao Liu>

"""Description: MACS 3 call peak main executable

Expand Down Expand Up @@ -57,7 +57,7 @@ def run(args):

# 0 output arguments
info("\n"+options.argtxt)
options.PE_MODE = options.format in ('BAMPE', 'BEDPE')
options.PE_MODE = options.format in ('BAMPE', 'BEDPE', 'FRAG')
if options.PE_MODE:
tag = 'fragment' # call things fragments not tags
else:
Expand Down
18 changes: 7 additions & 11 deletions MACS3/Commands/hmmratac_cmd.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Time-stamp: <2024-10-08 10:53:48 Tao Liu>

# Time-stamp: <2025-02-05 12:38:55 Tao Liu>

"""Description: Main HMMR command

Expand All @@ -24,7 +23,7 @@
# from MACS3.Utilities.Constants import *
from MACS3.Utilities.OptValidator import opt_validate_hmmratac
from MACS3.IO.PeakIO import PeakIO
from MACS3.IO.Parser import BAMPEParser, BEDPEParser # BAMaccessor
from MACS3.IO.Parser import BAMPEParser, BEDPEParser, FragParser # BAMaccessor
from MACS3.Signal.HMMR_EM import HMMR_EM
from MACS3.Signal.HMMR_Signal_Processing import (generate_weight_mapping,
generate_digested_signals,
Expand Down Expand Up @@ -103,6 +102,9 @@ def run(args):
elif options.format == "BEDPE":
options.info("#1 Read fragments from BEDPE file...")
parser = BEDPEParser
elif options.format == "FRAG":
options.info("#1 Read fragments from FRAG file...")
parser = FragParser
else:
raise Exception("wrong format")

Expand Down Expand Up @@ -204,7 +206,7 @@ def run(args):
fc_bdg = digested_atac_signals[0]
else:
options.info("# Pile up all fragments")
fc_bdg = petrack.pileup_bdg([1.0, ], baseline_value=0)
fc_bdg = petrack.pileup_bdg(baseline_value=0)
(sum_v, n_v, max_v, min_v, mean_v, std_v) = fc_bdg.summary()
options.info("# Convert pileup to fold-change over average signal")
fc_bdg.apply_func(lambda x: x/mean_v)
Expand Down Expand Up @@ -464,14 +466,8 @@ def run(args):
# First, the likelihoods for each of the three states in a bedGraph
if options.save_likelihoods:
options.info(f"# Write the likelihoods for each states into three bedGraph files {options.name}_open.bdg, {options.name}_nuc.bdg, and {options.name}_bg.bdg")
open_state_bdg_fhd = open(open_state_bdgfile, "w")
nuc_state_bdg_fhd = open(nuc_state_bdgfile, "w")
bg_state_bdg_fhd = open(bg_state_bdgfile, "w")
save_proba_to_bedGraph(predicted_proba_file, options.hmm_binsize, open_state_bdg_fhd, nuc_state_bdg_fhd, bg_state_bdg_fhd, i_open_region, i_nucleosomal_region, i_background_region)
save_proba_to_bedGraph(predicted_proba_file, options.hmm_binsize, open_state_bdgfile, nuc_state_bdgfile, bg_state_bdgfile, i_open_region, i_nucleosomal_region, i_background_region)
predicted_proba_file.seek(0) # reset
open_state_bdg_fhd.close()
nuc_state_bdg_fhd.close()
bg_state_bdg_fhd.close()
options.info("# finished writing proba_to_bedgraph")

# # Generate states path:
Expand Down
19 changes: 11 additions & 8 deletions MACS3/Commands/pileup_cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
under the terms of the BSD License (see the file LICENSE included with
the distribution).
"""
# Time-stamp: <2024-10-02 16:51:15 Tao Liu>
# Time-stamp: <2025-02-05 12:39:14 Tao Liu>

# ------------------------------------
# python modules
# ------------------------------------
Expand All @@ -16,7 +17,8 @@
# ------------------------------------
# from MACS3.Utilities.Constants import *
from MACS3.Utilities.OptValidator import opt_validate_pileup
from MACS3.Signal.Pileup import pileup_and_write_se, pileup_and_write_pe
from MACS3.Signal.Pileup import pileup_and_write_se
from MACS3.IO.BedGraphIO import bedGraphIO
# ------------------------------------
# Main function
# ------------------------------------
Expand All @@ -35,10 +37,10 @@ def run(o_options):
# error = options.error

# 0 output arguments
options.PE_MODE = options.format in ('BAMPE', 'BEDPE')
options.PE_MODE = options.format in ('BAMPE', 'BEDPE', 'FRAG')

# 0 prepare output file
outfile = os.path.join(options.outdir, options.outputfile).encode()
outfile = os.path.join(options.outdir, options.outputfile)
if os.path.isfile(outfile):
info("# Existing file %s will be replaced!" % outfile)
os.unlink(outfile)
Expand All @@ -51,8 +53,9 @@ def run(o_options):
t0 = treat.total # total fragments
info("# total fragments/pairs in alignment file: %d" % (t0))
info("# Pileup paired-end alignment file.")
pileup_and_write_pe(treat, outfile)

bdg = treat.pileup_bdg()
bdgio = bedGraphIO(outfile, data=bdg)
bdgio.write_bedGraph(trackline=False)
else:
(tsize, treat) = load_tag_files_options(options)

Expand All @@ -63,10 +66,10 @@ def run(o_options):

if options.bothdirection:
info("# Pileup alignment file, extend each read towards up/downstream direction with %d bps" % options.extsize)
pileup_and_write_se(treat, outfile, options.extsize * 2, 1, directional=False, halfextension=False)
pileup_and_write_se(treat, outfile.encode(), options.extsize * 2, 1, directional=False, halfextension=False)
else:
info("# Pileup alignment file, extend each read towards downstream direction with %d bps" % options.extsize)
pileup_and_write_se(treat, outfile, options.extsize, 1, directional=True, halfextension=False)
pileup_and_write_se(treat, outfile.encode(), options.extsize, 1, directional=True, halfextension=False)

info("# Done! Check %s" % options.outputfile)

Expand Down
2 changes: 1 addition & 1 deletion MACS3/IO/BedGraphIO.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# cython: language_level=3
# cython: profile=True
# Time-stamp: <2024-10-08 10:07:47 Tao Liu>
# Time-stamp: <2025-02-05 12:38:24 Tao Liu>

"""Module Description: IO Module for bedGraph file

Expand Down
6 changes: 5 additions & 1 deletion MACS3/IO/Parser.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# cython: language_level=3
# cython: profile=True
# cython: linetrace=True
# Time-stamp: <2024-10-22 10:25:23 Tao Liu>
# Time-stamp: <2024-11-29 23:30:03 Tao Liu>

"""Module for all MACS Parser classes for input. Please note that the
parsers are for reading the alignment files ONLY.
Expand Down Expand Up @@ -1528,6 +1528,10 @@ def pe_parse_line(self, thisline: bytes):
raise Exception("Less than 5 columns found at this line: %s\n" %
thisline)

@cython.ccall
def build_petrack(self):
return self.build_petrack2()

@cython.ccall
def build_petrack2(self):
"""Build PETrackII from all lines.
Expand Down
2 changes: 1 addition & 1 deletion MACS3/IO/PeakIO.py
Original file line number Diff line number Diff line change
Expand Up @@ -1046,7 +1046,7 @@ class BroadPeakIO:
"""IO for broad peak information.

"""
peaks: dict
peaks = cython.declare(dict, visibility="public")

def __init__(self):
self.peaks = {}
Expand Down
36 changes: 19 additions & 17 deletions MACS3/Signal/CallPeakUnit.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# cython: language_level=3
# cython: profile=True
# cython: linetrace=True
# Time-stamp: <2024-10-22 11:42:37 Tao Liu>
# Time-stamp: <2025-02-05 12:40:36 Tao Liu>

"""Module for Calculate Scores.

Expand Down Expand Up @@ -40,7 +40,7 @@
from MACS3.Signal.SignalProcessing import maxima, enforce_peakyness
from MACS3.IO.PeakIO import PeakIO, BroadPeakIO
from MACS3.Signal.FixWidthTrack import FWTrack
from MACS3.Signal.PairedEndTrack import PETrackI
from MACS3.Signal.PairedEndTrack import PETrackI, PETrackII
from MACS3.Signal.Prob import poisson_cdf
from MACS3.Utilities.Logger import logging

Expand Down Expand Up @@ -346,8 +346,8 @@ class CallerFromAlignments:
It will compute for each chromosome separately in order to save
memory usage.
"""
treat: object # FWTrack or PETrackI object for ChIP
ctrl: object # FWTrack or PETrackI object for Control
treat: object # FWTrack or PETrackI/II object for ChIP
ctrl: object # FWTrack or PETrackI/II object for Control

d: cython.int # extension size for ChIP
# extension sizes for Control. Can be multiple values
Expand Down Expand Up @@ -458,8 +458,10 @@ def __init__(self,
self.PE_mode = False
elif isinstance(treat, PETrackI):
self.PE_mode = True
elif isinstance(treat, PETrackII):
self.PE_mode = True
else:
raise Exception("Should be FWTrack or PETrackI object!")
raise Exception("Should be FWTrack or PETrackI/II object!")
# decide if there is control
self.treat = treat
if ctrl:
Expand Down Expand Up @@ -565,12 +567,12 @@ def pileup_treat_ctrl_a_chromosome(self, chrom: bytes):

if self.PE_mode:
treat_pv = self.treat.pileup_a_chromosome(chrom,
[self.treat_scaling_factor,],
self.treat_scaling_factor,
baseline_value=0.0)
else:
treat_pv = self.treat.pileup_a_chromosome(chrom,
[self.d,],
[self.treat_scaling_factor,],
self.d,
self.treat_scaling_factor,
baseline_value=0.0,
directional=True,
end_shift=self.end_shift)
Expand All @@ -585,11 +587,11 @@ def pileup_treat_ctrl_a_chromosome(self, chrom: bytes):
self.ctrl_scaling_factor_s,
baseline_value=self.lambda_bg)
else:
ctrl_pv = self.ctrl.pileup_a_chromosome(chrom,
self.ctrl_d_s,
self.ctrl_scaling_factor_s,
baseline_value=self.lambda_bg,
directional=False)
ctrl_pv = self.ctrl.pileup_a_chromosome_c(chrom,
self.ctrl_d_s,
self.ctrl_scaling_factor_s,
baseline_value=self.lambda_bg,
directional=False)
else:
# a: set global lambda
ctrl_pv = [treat_pv[0][-1:], np.array([self.lambda_bg,],
Expand Down Expand Up @@ -814,7 +816,7 @@ def __cal_pvalue_qvalue_table(self):
if q <= 0:
q = 0
break
#q = max(0,min(pre_q,q)) # make q-score monotonic
# q = max(0,min(pre_q,q)) # make q-score monotonic
self.pqtable[v] = q
pre_q = q
k += l
Expand All @@ -828,8 +830,9 @@ def __cal_pvalue_qvalue_table(self):
def __pre_computes(self,
max_gap: cython.int = 50,
min_length: cython.int = 200):
"""After this function is called, self.pqtable and self.pvalue_length is built. All
chromosomes will be iterated. So it will take some time.
"""After this function is called, self.pqtable and
self.pvalue_length is built. All chromosomes will be
iterated. So it will take some time.

"""
chrom: bytes
Expand Down Expand Up @@ -1049,7 +1052,6 @@ def call_peaks(self,
else:
self.__cal_pvalue_qvalue_table()


# prepare bedGraph file
if self.save_bedGraph:
self.bedGraph_treat_f = fopen(self.bedGraph_treat_filename, "w")
Expand Down
Loading