Name	Name	Last commit message	Last commit date
Latest commit camilogarciabotero avoid generatedna until better MarkovChainHammer loading times Nov 3, 2023 b6c747d · Nov 3, 2023 History 130 Commits
.github	.github	Bump actions/checkout from 3 to 4	Sep 4, 2023
docs	docs	update logo arrows and colors	Oct 23, 2023
ext	ext	avoid generatedna until better MarkovChainHammer loading times	Nov 3, 2023
src	src	avoid generatedna until better MarkovChainHammer loading times	Nov 3, 2023
test	test	update tests deps	Jul 18, 2023
.gitignore	.gitignore	files generated by ion	Jul 11, 2023
CHANGELOG.md	CHANGELOG.md	update CHANGELOG	Oct 23, 2023
CITATION.cff	CITATION.cff	update citation file	Jul 18, 2023
LICENSE	LICENSE	files generated by ion	Jul 11, 2023
Project.toml	Project.toml	remove some deps	Oct 24, 2023
README.md	README.md	update README	Aug 21, 2023

Repository files navigation

Representing biological sequences as Markov chains

BioMarkovChains

A Julia package to represent biological sequences as Markov chains

Installation

BioMarkovChains is a Julia Language package. To install BioMarkovChains, please open Julia's interactive session (known as REPL) and press ] key in the REPL to use the package mode, then type the following command

pkg> add BioMarkovChains

Creating Markov chain out of DNA sequences

An important step before developing several gene finding algorithms consist of having a Markov chain representation of the DNA. To do so, we implemented the BioMarkovChain method that will capture the initials and transition probabilities of a DNA sequence (LongSequence) and will create a dedicated object storing relevant information of a DNA Markov chain. Here an example:

Let find one ORF in a random LongDNA :

using BioSequences, GeneFinder, BioMarkovChains

sequence = randdnaseq(10^3)
orfdna = getorfdna(sequence, min_len=75)[1]

If we translate it, we get a 69aa sequence:

translate(orfdna)

69aa Amino Acid Sequence:
MSCGETTVSPILSRRTAFIRTLLGYRFRSNLPTKAERSRFGFSLPQFISTPNDRQNGNGGCGCGLENR*

Now supposing I do want to see how transitions are occurring in this ORF sequence, the I can use the BioMarkovChain method and tune it to 2nd-order Markov chain:

BioMarkovChain(orfdna, 2)

BioMarkovChain with DNA Alphabet:
  - Transition Probability Matrix -> Matrix{Float64}(4 × 4):
   0.2123  0.2731  0.278   0.2366
   0.2017  0.3072  0.2687  0.2224
   0.1978  0.2651  0.2893  0.2478
   0.2013  0.3436  0.2431  0.212
  - Initial Probabilities -> Vector{Float64}(4 × 1):
   0.2027
   0.2973
   0.2703
   0.2297
  - Markov Chain Order -> Int64:
   2

This is useful to later create HMMs and calculate sequence probability based on a given model, for instance we now have the E. coli CDS and No-CDS transition models or Markov chain implemented:

ECOLICDS

BioMarkovChain with DNA Alphabet:
  - Transition Probability Matrix -> Matrix{Float64}(4 × 4):
   0.31    0.224   0.199   0.268
   0.251   0.215   0.313   0.221
   0.236   0.308   0.249   0.207
   0.178   0.217   0.338   0.267
  - Initial Probabilities -> Vector{Float64}(4 × 1):
   0.245
   0.243
   0.273
   0.239
  - Markov Chain Order -> Int64:
   1

What is then the probability of the previous random Lambda phage DNA sequence given this model?

dnaseqprobability(orfdna, ECOLICDS)

7.466531836596359e-45

This is off course not very informative, but we can later use different criteria to then classify new ORFs. For a more detailed explanation see the docs

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioMarkovChains

Installation

Creating Markov chain out of DNA sequences

License

About

Releases 22

Packages

Contributors 2

Languages

License

BioJulia/BioMarkovChains.jl

Folders and files

Latest commit

History

Repository files navigation

BioMarkovChains

Installation

Creating Markov chain out of DNA sequences

License

About

Topics

Resources

License

Citation

Stars

Watchers

Forks

Releases 22

Packages 0

Contributors 2

Languages

Packages