Negative Or Mean Affinity Discrimination (NOMAD) Projection is a massively scalable method for nonlinear dimensionality reduction. It is the fastest and easiest way to compute t-SNE or UMAP style visualizations of multi-million point datasets.
Please contact [email protected] with inquiries!
You can install NOMAD Projection with pip:
pip install nomad-projection
from nomad_projection import NomadProjection
p = NomadProjection()
#Required Parameters
lowd = p.fit_transform(X=x,
epochs=100,
batch_size=80000)
#All Parameters
lowd = p.fit_transform(X=x,
epochs=100,
batch_size=80000,
n_neighbors=8,
n_noise=10000,
n_cells=5,
cluster_subset_size=5000000,
momentum=0.8,
lr_scale=0.1,
learning_rate_decay_start_time=0.3,
late_exaggeration_time=1.7,
late_exaggeration_scale=1.2,
late_exaggeration_n_noise=10000,
)
Due to the heterogeneous nature of python package management and cuda configuration, replicating the paper requires managing 3 different environments.
The nomad projection environment is the managed with venv. Simply run the following commands from the root of the repository to create it:
python3 -m venv nomad_projection_env
source nomad_projection_env/bin/activate
pip install .
The t-SNE-CUDA environment requires miniconda to be installed. First, follow the instructions here to install miniconda. Then, follow the setup on the t-SNE-CUDA repository. Finally, run the following commands from the root of the repository in your conda environment:
conda install pytorch click scikit-learn pandas
pip install -e .
The RAPIDS UMAP environment reuires a custom conda environment which is generated from the RAPIDS installation selector. For the paper, the following command was used:
conda create -n rapids-24.10 -c rapidsai -c conda-forge -c nvidia \
rapids=24.10 python=3.12 'cuda-version>=12.0,<=12.5'
Once this command executes, run the following commands from the root of the repository in your rapisd conda environment:
conda install pytorch click scikit-learn pandas
pip install -e .
Input data can be downloaded from R2.
To gain access, run aws configure
with the following credentials:
Access Key ID: 94bef7d178281190c5ca48f483b6504b
Secret Access Key: ac885a46694e8e1a073375b8da1961be42371a9f4434bd58ffd3e5c46a3be67b
Then run the following command from the root of the repository to download the data (please note that this will download nearly a terabyte of data):
aws s3 sync --endpoint-url=https://9fa58365a1a3d032127970d0bd9a1290.r2.cloudflarestorage.com/ s3://nomad-projection-input-data ./data
NOMAD Projection uses the figures submodule to manage generation of the figures in the paper.
Reproducing the arXiv and Imagenet figures requires two steps:
- Run commands to generate results from each algorithm for each dataset.
- Assemble the results into the final plot.
From the nomad projection environment, run the following command:
python nomad_project/figures/arxiv.py --nomad
python nomad_project/figures/imagenet.py --nomad
From the t-SNE-CUDA environment, run the following commands:
python nomad_project/figures/arxiv.py --tsnecuda
python nomad_project/figures/imagenet.py --tsnecuda
From the RAPIDS UMAP environment, run the following commands:
python nomad_project/figures/arxiv.py --rapids-umap
python nomad_project/figures/imagenet.py --rapids-umap
Finally, assemble the results into the final plot in the nomad projection environment
python nomad_project/figures/arxiv.py --plot
python nomad_project/figures/imagenet.py --plot
The output will be stored in the results directory in the root of the repository.
From the nomad projection environment, run the following command:
python nomad_project/figures/pubmed.py --nomad
The output will be stored in the results directory in the root of the repository.
python nomad_project/figures/wiki.py --nomad
The output will be stored in the results directory in the root of the repository.