Skip to content

Fast methods for non-negative matrix tri-factorization

License

Notifications You must be signed in to change notification settings

acopar/fast-nmtf

Repository files navigation

fast-nmtf

Fast optimization of non-negative matrix tri-factorization.

Installation

This project relies on numpy and scipy libraries. For best results, we recommend installing it inside the Anaconda environment. Anaconda simplifies the environment setup by providing optimized libraries for matrix operations (such as Intel MKL).

   git clone https://github.com/acopar/fast-nmtf
   cd fast-nmtf
   conda env create -f environment.yml
   conda activate fast-nmtf
   pip install -e .

Data

To download preprocessed benchmark datasets, use the provided get_datasets.sh script.

    scripts/get_datasets.sh

This script downloads datasets that have already been preprocessed and converted into npz (numpy compressed) format:

Example

    python fnmtf/factorize.py -t cod -k 20 data/aldigs.npz

The following optimization techniques can be set with option -t.:

  • mu: multiplicative updates
  • als: alternating least squares
  • pg: projected gradient
  • cod: coordinate descent

Reproduce results

To exactly reproduce the experiments, where each dataset is run ten times on each of the optimization techniques, run the following command. This may take days depending on your configuration.

    bash scripts/full.sh

Long test will evaluate convergence (using the same factorization rank=20). This will take hours to complete (less than 10 times faster compared to full test).

    bash scripts/long.sh

There is a shorter version of the experiments, which has a lower threshould for convergence (epsilon=10^-5), max iterations set to 2000. This test will complete in a few hours.

    bash scripts/short.sh

After the experiments are done, you can visualize the output using the following two commands:

    python fnmtf/visualize.py

Command line arguments

  • -t [arg]: Optimization technique [mu, als, pg, cod]
  • -s: Use sparse matrices
  • -k [arg]: factorization rank, positive integer
  • -p [arg]: number of parallel workers
  • -S [arg]: random seed
  • -e [arg]: stopping criteria threshould (higher means more iterations), default=6
  • -m [arg]: minimum number of iterations
  • data: last argument is path to the dataset (required)

Retrieve factors

After the factorization is finished, U, S, and V factors are stored in results/<dataset>/<technique>/<factor>.csv. For example, if you selected cod technique, the results can be viewed using the following commands, where U is left factor, S is middle factor and V is right factor.

    cat results/aldigs/cod/U.csv
    cat results/aldigs/cod/S.csv
    cat results/aldigs/cod/V.csv

For convenience, all three factors are also saved in results/aldigs/cod.pkl as a tuple of numpy matrices and can be loaded with load_file function provided in loader.py.

About

Fast methods for non-negative matrix tri-factorization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published