Skip to content

Latest commit

 

History

History
115 lines (99 loc) · 4.97 KB

README.md

File metadata and controls

115 lines (99 loc) · 4.97 KB

Scalable Maximal Overlap Discrete Wavelet Tranform (MODWT)

Shane Neph and Scott Kuehn

Overview

An efficient implementation of the the Maximal Overlap Discrete Wavelet Tranform (MODWT). See D. B. Percival and A. T. Walden (2000), Wavelet Methods for Time Series Analysis. Cambridge, England: Cambridge University Press. This is not the usual discrete wavelet transform found in, for example, gsl but an extended set of algorithms designed to overcome some problems with the usual discrete wavelet transform.

See http://faculty.washington.edu/dbp/PDFFILES/4-Lund-A4.pdf for an overview and comparison to the regular discrete transform.

Build

make -C src/
bin/modwt --help

Documentation

doc/ has an html document to open in your browser (same information as shown below)
bin/modwt --help includes all option arguments

The Maximal Overlap Discrete Wavelet Transform (MODWT) library is written to be as efficient in RAM and time requirements as possible with particular emphasis on RAM. The application utilizes the library in the most efficient way allowing us to scale to the whole genome level.

Design Intentions

Library

  • Make it fast and memory efficient, with particular emphasis on RAM requirements.
  • Build as a generic library API that can work with any number of different data types, such as simple numeric, BED, WIG, etc. A generic API may be used in any number of ways in any number of applications. The application discussed here does NOT utilize the full features of the library API, and is only a single example of how an application may be built from the library components.
  • Make computing any type of MODWT wavelet values independent of the level/scale requested in terms of RAM requirements.

Application: modwt

  • Build a wrapper around the most useful features of the library and expose as a command-line tool
  • Use the library in the most efficient ways possible, even if the application itself becomes slightly cumbersome (see Output)



General Usage

NOTE modwt --help shows a lot of useful information. It includes all available filters, boundary conditions and more.

modwt
[--boundary <string = periodic>]
[--filter <string = LA8>]
[--help]
[--level <integer = 4>]
[--operation <string = smooth>]
[--prefix <string = "">]
[--to-stdout]
<file-name>

Where

--boundary may be

  • periodic [default]
  • reflected

--filter may be

  • haar
  • d4, d6, d8, d10, d12, d14, d16, d18, d20 (daubechies)
  • la8, la10, la12, la14, la16, la18, la20 (least asymmetric) [la8 by default]
  • bl14, bl18, bl20 (best localized)
  • c6, c12, c18, c24, c30 (coiflet)

--level

  • is the number of levels the program will sweep through [4 by default]

--operation may be

  • all
  • details
  • mra
  • scale (coefficients)
  • smooth [default]
  • wave (coefficients)
  • wave-scale (coefficients)

--prefix

  • may be anything you want as a prefix to all output files generated. This may not be used with --to-stdout.

--to-stdout

  • only available when --operation set to smooth or scale
  • may not be used with --prefix


Option names are NOT case sensitive
Values passed to --boundary, --filter or --operation are NOT case sensitive



Output

File names produced from the application (not the library) are of the form:

  • details.i : i = 1..level
  • scaling-coefficients.level
  • smoothing.level
  • wavelet-coefficients.i : i = 1..level

Any --prefix specified by the end user precedes each name shown above.
Not all of these files are produced unless --operation is set to ALL


Open Issues, Notes and Related Items

  • Only MODWT and related items are available from the library right now. See D. B. Percival and A. T. Walden (2000), Wavelet Methods for Time Series Analysis. Cambridge, England: Cambridge University Press.
  • We did not expose the capability to feed files back into the program to recalculate the original series. The library does have this capability.
  • Files are spit out in the current working directory (cwd) when not using --to-stdout nor --prefix.