0.1.2

twlab · Nov 12, 2024 · 3832237 · 3832237
1 parent 93aac06
commit 3832237
Show file tree

Hide file tree

Showing 17 changed files with 1,798 additions and 28 deletions.
diff --git a/Docker/build.sh b/Docker/build.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 
 
-tag="V0.1.1"
+tag="V0.1.2"
 
 docker build --platform linux/amd64 -t wenjin27/methylgrapher:latest -t wenjin27/methylgrapher:$tag ./
 # docker push wenjin27/methylgrapher:latest

diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle
diff --git a/docs/build/doctrees/filetype.doctree b/docs/build/doctrees/filetype.doctree
diff --git a/docs/build/doctrees/process.doctree b/docs/build/doctrees/process.doctree
diff --git a/docs/build/html/_sources/filetype.rst.txt b/docs/build/html/_sources/filetype.rst.txt
@@ -5,17 +5,17 @@ File Types
 
 
 GFA: Graphical Fragment Assembly format
-------------------------------
+------------------------------------------------------------
 For detail explanation, refer to https://gfa-spec.github.io/GFA-spec/GFA1.html
 
 
 GAF: Graphical mApping Format
-------------------------------
+------------------------------------------------------------
 For detail explanation, refer to https://github.com/lh3/gfatools/blob/master/doc/rGFA.md
 
 
 methyl: Graph methylation file
-------------------------------
+------------------------------------------------------------
 The methylGrapher extraction output provides a list of each cytosine mapped to the genome graph, specified by its coordinates (segment ID, 0-based position on the segment, and strand relative to the segment).
 For each cytosine, the output includes its context, the number of read-pairs supporting whether the cytosine is methylated or unmethylated, and the coverage, which is the sum of methylated and unmethylated read-pairs.
 The methylation percentage is calculated as the ratio of methylated read-pairs to the total coverage.

diff --git a/docs/build/html/_sources/process.rst.txt b/docs/build/html/_sources/process.rst.txt
@@ -11,9 +11,6 @@ Description
 ~~~~~~~~~~~~~~~~~~~~~~
 This process involves transforming the genome graph from GFA format into two fully converted genome graphs: one depleted of C bases and another depleted of G bases. Additionally, if desired, you may include a spike-in genome in FASTA format to estimate the conversion rate in a single step further.
 
-.. note::
-    It's important to note that in the C-to-T genome graph, if both C and T segments are positioned identically—meaning they share the same parent and child segments, and each has only one parent and one child—the T segments are removed. Additionally, any associated links and paths are redirected to the corresponding C segment. This principle also applies in the G-to-A genome graph.
-
 Example Usage
 ~~~~~~~~~~~~~~~~~~~~~~
 .. code-block:: shell

diff --git a/docs/build/html/process.html b/docs/build/html/process.html
@@ -118,10 +118,6 @@ <h2>Genome Indexing<a class="headerlink" href="#genome-indexing" title="Link to
 <section id="description">
 <h3>Description<a class="headerlink" href="#description" title="Link to this heading"></a></h3>
 <p>This process involves transforming the genome graph from GFA format into two fully converted genome graphs: one depleted of C bases and another depleted of G bases. Additionally, if desired, you may include a spike-in genome in FASTA format to estimate the conversion rate in a single step further.</p>
-<div class="admonition note">
-<p class="admonition-title">Note</p>
-<p>It’s important to note that in the C-to-T genome graph, if both C and T segments are positioned identically—meaning they share the same parent and child segments, and each has only one parent and one child—the T segments are removed. Additionally, any associated links and paths are redirected to the corresponding C segment. This principle also applies in the G-to-A genome graph.</p>
-</div>
 </section>
 <section id="example-usage">
 <h3>Example Usage<a class="headerlink" href="#example-usage" title="Link to this heading"></a></h3>

diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js
diff --git a/docs/source/process.rst b/docs/source/process.rst
@@ -11,9 +11,6 @@ Description
 ~~~~~~~~~~~~~~~~~~~~~~
 This process involves transforming the genome graph from GFA format into two fully converted genome graphs: one depleted of C bases and another depleted of G bases. Additionally, if desired, you may include a spike-in genome in FASTA format to estimate the conversion rate in a single step further.
 
-.. note::
-    It's important to note that in the C-to-T genome graph, if both C and T segments are positioned identically—meaning they share the same parent and child segments, and each has only one parent and one child—the T segments are removed. Additionally, any associated links and paths are redirected to the corresponding C segment. This principle also applies in the G-to-A genome graph.
-
 Example Usage
 ~~~~~~~~~~~~~~~~~~~~~~
 .. code-block:: shell

diff --git a/src/config.ini b/src/config.ini
@@ -1,6 +1,7 @@
 [default]
 # Provide the path to the executable
 vg_path = vg
+ga_path = GraphAligner
 fastuniq_path = fastuniq
 
 

diff --git a/src/gfa.py b/src/gfa.py
@@ -345,6 +345,32 @@ def get_sequences_by_segment_ID(self, segment_IDs):
         return res
 
 
+# Store segment length in memory
+class GraphicalFragmentAssemblySegmentLengthMemory(object):
+
+    def __init__(self):
+        self.clear()
+
+    def clear(self):
+        self._segment_length = {}
+
+    def parse(self, gfa_file):
+        with open(gfa_file) as gfa_fh:
+            for l in gfa_fh:
+                if l[0] not in "S":
+                    continue
+
+                l = l.strip().split("\t")
+
+                if l[0] == "S":
+                    rt, segID, seq, *tags = l
+                    # sequence, tag, links, parent_links_count
+                    self._segment_length[segID] = len(seq)
+
+
+    def get_sequence_length_by_segment_ID(self, segment_ID):
+        return self._segment_length[segment_ID]
+