Skip to content

Latest commit

 

History

History
188 lines (138 loc) · 8.04 KB

DEV_NOTES.md

File metadata and controls

188 lines (138 loc) · 8.04 KB

Developer notes

External dependencies

  • swig (Simple Wrapper Interface Generator)
  • gsl (GNU Scientific Library)
  • Numeric (Python Numeric)
  • libxml2/libxslt (Python bindings)
  • pytest (Python test framework)

Installing swig on recent Ubuntu releases

There is a bug in versions swig 3.0.6 to 3.0.10 that prevents swig on xenial (which is version 3.0.8 of swig) working. You will need to install the lastest version from source.

  1. Get swig dependency:

sudo apt install libpcre3-dev

  1. Visit swig.org to get download link

  2. Do the installation:

    tar zxvf ~/swig-3.0.12.tar.gz
    cd swig-3.0.12
    ./configure
    make
    sudo make install
    

More details on pytest

If for some reason ./setup.py test doesn't work, you can use pip to manually install pytest:

  pip install --user pytest

(Ensure that the local user path is in PATH, you may need to modify ~/.bash_profile accordingly. On MacOS e.g. export PATH=$HOME/Library/Python/2.7/bin:$PATH, on Linux export PATH=$HOME/.local/bin:$PATH.) Verify that py.test is in your PATH by running which py.test.

Design notes

  • 'Main' is the primary interface to the PyPop modules. Given a ConfigParser instance, which can be (1) created from a filename passed from command-line argument or (2) from values populated by the GUI (currently selected from an .ini file, but can ultimately be set directly from the GUI or values based from a form to a web server or the) it then runs the specified modules (outlined below).

  • 'GUIApp' is the graphical front-end to PyPop which uses the "wxPython":http://www.wxpython.org GUI toolkit. wxPython is a set of Python bindings to "wxWindows":http://www.wxwindows.org, which is an open-source cross-platform GUI widget toolkit which has a native look under GNU/Linux (GTK), Windows (MFC) and MacOS X (Aqua). [as of 2017, this is deprecated]

  • 'ParseFile' is a base class which has most of the common functionality for reading files.

  • 'ParseGenotypeFile' is a subclass of 'ParseFile' that deals with files that consist specifically of data with individual genotyped for one or more loci.

  • 'ParseAlleleCount' is another subclass of 'ParseFile' that deals with files consisting of allele counts across a whole population.

  • 'HardyWeinberg' is a class that calculates Hardy-Weinberg statistics given genotype data for a single locus.

  • 'HardyWeinbergGuoThompson' a subclass of 'HardyWeinberg' that uses the Guo & Thompson algorithm for calculating statistics.

  • 'HardyWeinbergGuoThompsonArlequin' a subclass of 'HardyWeinberg' that uses the Arlequin implementation of the Guo & Thompson algorithm for calculating statistics.

  • 'Haplo' is an abstract base class for estimating haplotypes given genotype data.

  • 'HaploArlequin' is a subclass of 'Haplo' that uses Arlequin for estimation of haplotypes (obsolete).
  • 'Emhaplofreq' is a subclass of 'Haplo' that uses 'emhaplofreq' (Rich Single's program) for the estimation of haplotypes and linkage disequilibrium values.

  • 'ArlequinWrapper' the underlying class that `wraps' the functionality of the "Arlequin":http://lgb.unige.ch/arlequin/ program (obsolete: this class, in turn, supplies 'HaploArlequin' with required information).

  • 'Homozygosity' Calculates homozygosity statistics for a given locus, calculates the observed homozygosity and returns the approximate expected homozygosity statistics taken from previous simulation runs.

Both file formats are assumed to have a population header information with, consisting of a line of column headers (population metadata) followed by a line with the actual data, followed by the column headers for the samples (sample metadata) followed by the sample data itself (either individuals in the genotyped case, or alleles in the allele count case).

macports.org

  • To install macports via the command-line you can run the following (substituting the current link):
curl -L 'https://github.com/macports/macports-base/releases/download/v2.4.1/MacPorts-2.4.1-10.12-Sierra.pkg' > MacPorts-2.4.1-10.12-Sierra.pkg
sudo installer -pkg MacPorts-2.4.1-10.12-Sierra.pkg  -target /

Containerizing

To make pypop more portable (given that some of its dependencies are currently obsolete), it is possible to build a Singularity container which contains a minimal Fedora 25 installation (minus the Kernel), pypop, pypop's dependencies, and some extra tools (yum, rpm, less, and vim) in case you need to do work inside the container.

Singularity containers bind-mount many external directories by default (for example, /home and /tmp), with the container image kept read-only. When run inside the container, pypop will work on your files, even though they live outside the container.

Singularity 2.3 or later is required in order to bootstrap this container. The container also must be bootstrapped & run on a Linux system, running the x86_64 architecture, because that's the OS & architecture the container uses.

To build pypop as a singularity container, once you have Singularity installed, perform these three steps:

  1. cd path/to/pypop/source
  2. singularity create -s 2048 image.img
  3. sudo singularity bootstrap image.img Singularity

The above commands will give you a 2 GiB executable file named image.img. That is the container.

The first command ensures that you are in the pypop source directory. This is required because part of the bootstrap process copies the source into the container.

The second command creates a 2 GiB (a 2048 MiB) container image. This should be large enough, but you can increase or decrease it as you wish. Note that if you make it too small, the bootstrap might not have enough room to complete!

The final command performs the bootstrap. The bootstrap needs to be run as root, so you either need to use sudo (as shown in the example above) or you need to run the command in a root shell. The bootstrap does a number of things:

  • Mount the container image read/write.
  • Download and install the Fedora 25 GPG key.
  • Create a temporary Yum repo file, pointing to the Fedora 24 package archive.
  • Install the basesystem package; GCC, SWIG, and GSL; Python (both the
  • executable and development packages); and the Python modules for Numeric, libxml2, and libxslt.
  • Copy the entire pypop source directory into the container.
  • Build pypop (again, inside the container).

Once you have the container image, running it is as simple as executing image.img. For example:

akkornel@blargh-yakkety-typical:~/pypop$ ./image.img -V
pypop 0.8.0
Copyright (C) 2003-2005 Regents of the University of California
This is free software.  There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

akkornel@blargh-yakkety-typical:~/pypop$ ./image.img -h
Usage: pypop [OPTION]... [INPUTFILE]...
Process and run population genetics statistics on one or more INPUTFILEs.
Expects to find a configuration file called 'config.ini' in the
current directory or in /usr/share/pypop/config.ini.

  -l, --use-libxslt    filter XML via XSLT using libxslt (default)
  -s, --use-4suite     filter XML via XSLT using 4Suite
  -x, --xsl=FILE       use XSLT translation file FILE
  -h, --help           show this message
  -c, --config=FILE    select alternative config file
  -d, --debug          enable debugging output (overrides config file setting)
  -i, --interactive    run in interactive mode, prompting user for file names
  -g, --gui            run GUI (currently disabled)
  -o, --outputdir=DIR  put output in directory DIR
  -f, --filelist=FILE  file containing list of files (one per line) to process
                        (mutually exclusive with supplying INPUTFILEs)
      --generate-tsv   generate TSV output files (aka run 'popmeta')
  -V, --version        print version of PyPop
  
  INPUTFILE   input text file

Once built, the container image can be transferred to any other system which is running Linux x86_64, and which has the same version of Singularity (or newer).