Skip to content

Commit

Permalink
0.1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
Hendricks27 committed Nov 12, 2024
1 parent 93aac06 commit 3832237
Show file tree
Hide file tree
Showing 17 changed files with 1,798 additions and 28 deletions.
2 changes: 1 addition & 1 deletion Docker/build.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash


tag="V0.1.1"
tag="V0.1.2"

docker build --platform linux/amd64 -t wenjin27/methylgrapher:latest -t wenjin27/methylgrapher:$tag ./
# docker push wenjin27/methylgrapher:latest
Expand Down
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/filetype.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/process.doctree
Binary file not shown.
6 changes: 3 additions & 3 deletions docs/build/html/_sources/filetype.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ File Types


GFA: Graphical Fragment Assembly format
------------------------------
------------------------------------------------------------
For detail explanation, refer to https://gfa-spec.github.io/GFA-spec/GFA1.html


GAF: Graphical mApping Format
------------------------------
------------------------------------------------------------
For detail explanation, refer to https://github.com/lh3/gfatools/blob/master/doc/rGFA.md


methyl: Graph methylation file
------------------------------
------------------------------------------------------------
The methylGrapher extraction output provides a list of each cytosine mapped to the genome graph, specified by its coordinates (segment ID, 0-based position on the segment, and strand relative to the segment).
For each cytosine, the output includes its context, the number of read-pairs supporting whether the cytosine is methylated or unmethylated, and the coverage, which is the sum of methylated and unmethylated read-pairs.
The methylation percentage is calculated as the ratio of methylated read-pairs to the total coverage.
Expand Down
3 changes: 0 additions & 3 deletions docs/build/html/_sources/process.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@ Description
~~~~~~~~~~~~~~~~~~~~~~
This process involves transforming the genome graph from GFA format into two fully converted genome graphs: one depleted of C bases and another depleted of G bases. Additionally, if desired, you may include a spike-in genome in FASTA format to estimate the conversion rate in a single step further.

.. note::
It's important to note that in the C-to-T genome graph, if both C and T segments are positioned identically—meaning they share the same parent and child segments, and each has only one parent and one child—the T segments are removed. Additionally, any associated links and paths are redirected to the corresponding C segment. This principle also applies in the G-to-A genome graph.

Example Usage
~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: shell
Expand Down
4 changes: 0 additions & 4 deletions docs/build/html/process.html
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,6 @@ <h2>Genome Indexing<a class="headerlink" href="#genome-indexing" title="Link to
<section id="description">
<h3>Description<a class="headerlink" href="#description" title="Link to this heading"></a></h3>
<p>This process involves transforming the genome graph from GFA format into two fully converted genome graphs: one depleted of C bases and another depleted of G bases. Additionally, if desired, you may include a spike-in genome in FASTA format to estimate the conversion rate in a single step further.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>It’s important to note that in the C-to-T genome graph, if both C and T segments are positioned identically—meaning they share the same parent and child segments, and each has only one parent and one child—the T segments are removed. Additionally, any associated links and paths are redirected to the corresponding C segment. This principle also applies in the G-to-A genome graph.</p>
</div>
</section>
<section id="example-usage">
<h3>Example Usage<a class="headerlink" href="#example-usage" title="Link to this heading"></a></h3>
Expand Down
2 changes: 1 addition & 1 deletion docs/build/html/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 0 additions & 3 deletions docs/source/process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@ Description
~~~~~~~~~~~~~~~~~~~~~~
This process involves transforming the genome graph from GFA format into two fully converted genome graphs: one depleted of C bases and another depleted of G bases. Additionally, if desired, you may include a spike-in genome in FASTA format to estimate the conversion rate in a single step further.

.. note::
It's important to note that in the C-to-T genome graph, if both C and T segments are positioned identically—meaning they share the same parent and child segments, and each has only one parent and one child—the T segments are removed. Additionally, any associated links and paths are redirected to the corresponding C segment. This principle also applies in the G-to-A genome graph.

Example Usage
~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: shell
Expand Down
1 change: 1 addition & 0 deletions src/config.ini
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[default]
# Provide the path to the executable
vg_path = vg
ga_path = GraphAligner
fastuniq_path = fastuniq


Expand Down
26 changes: 26 additions & 0 deletions src/gfa.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,32 @@ def get_sequences_by_segment_ID(self, segment_IDs):
return res


# Store segment length in memory
class GraphicalFragmentAssemblySegmentLengthMemory(object):

def __init__(self):
self.clear()

def clear(self):
self._segment_length = {}

def parse(self, gfa_file):
with open(gfa_file) as gfa_fh:
for l in gfa_fh:
if l[0] not in "S":
continue

l = l.strip().split("\t")

if l[0] == "S":
rt, segID, seq, *tags = l
# sequence, tag, links, parent_links_count
self._segment_length[segID] = len(seq)


def get_sequence_length_by_segment_ID(self, segment_ID):
return self._segment_length[segment_ID]




Expand Down
Loading

0 comments on commit 3832237

Please sign in to comment.