Skip to content

nomic-ai/nomad_projection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NOMAD Projection

NOMAD Projection

Negative Or Mean Affinity Discrimination (NOMAD) Projection is a massively scalable method for nonlinear dimensionality reduction. It is the fastest and easiest way to compute t-SNE or UMAP style visualizations of multi-million point datasets.

Please contact [email protected] with inquiries!

Installation

You can install NOMAD Projection with pip:

pip install nomad-projection

Usage

from nomad_projection import NomadProjection

p = NomadProjection()

#Required Parameters
lowd = p.fit_transform(X=x,
                       epochs=100,
                       batch_size=80000)

#All Parameters
lowd = p.fit_transform(X=x,
                       epochs=100,
                       batch_size=80000,
                       n_neighbors=8,
                       n_noise=10000,
                       n_cells=5,
                       cluster_subset_size=5000000,
                       momentum=0.8,
                       lr_scale=0.1,
                       learning_rate_decay_start_time=0.3,
                       late_exaggeration_time=1.7,
                       late_exaggeration_scale=1.2,
                       late_exaggeration_n_noise=10000,
                       )

Paper Replication

Environment Setup

Due to the heterogeneous nature of python package management and cuda configuration, replicating the paper requires managing 3 different environments.

Nomad Projection Environment:

The nomad projection environment is the managed with venv. Simply run the following commands from the root of the repository to create it:

python3 -m venv nomad_projection_env
source nomad_projection_env/bin/activate
pip install .

t-SNE-CUDA Environment:

The t-SNE-CUDA environment requires miniconda to be installed. First, follow the instructions here to install miniconda. Then, follow the setup on the t-SNE-CUDA repository. Finally, run the following commands from the root of the repository in your conda environment:

conda install pytorch click scikit-learn pandas
pip install -e .

RAPIDS UMAP Environment:

The RAPIDS UMAP environment reuires a custom conda environment which is generated from the RAPIDS installation selector. For the paper, the following command was used:

conda create -n rapids-24.10 -c rapidsai -c conda-forge -c nvidia  \
    rapids=24.10 python=3.12 'cuda-version>=12.0,<=12.5'

Once this command executes, run the following commands from the root of the repository in your rapisd conda environment:

conda install pytorch click scikit-learn pandas
pip install -e .

Input Data

Input data can be downloaded from R2. To gain access, run aws configure with the following credentials:

Access Key ID: 94bef7d178281190c5ca48f483b6504b

Secret Access Key: ac885a46694e8e1a073375b8da1961be42371a9f4434bd58ffd3e5c46a3be67b

Then run the following command from the root of the repository to download the data (please note that this will download nearly a terabyte of data):

aws s3 sync --endpoint-url=https://9fa58365a1a3d032127970d0bd9a1290.r2.cloudflarestorage.com/ s3://nomad-projection-input-data ./data

Figures

NOMAD Projection uses the figures submodule to manage generation of the figures in the paper.

ArXiv and Imagenet

Reproducing the arXiv and Imagenet figures requires two steps:

  1. Run commands to generate results from each algorithm for each dataset.
  2. Assemble the results into the final plot.

From the nomad projection environment, run the following command:

python nomad_project/figures/arxiv.py --nomad
python nomad_project/figures/imagenet.py --nomad

From the t-SNE-CUDA environment, run the following commands:

python nomad_project/figures/arxiv.py --tsnecuda
python nomad_project/figures/imagenet.py --tsnecuda

From the RAPIDS UMAP environment, run the following commands:

python nomad_project/figures/arxiv.py --rapids-umap
python nomad_project/figures/imagenet.py --rapids-umap

Finally, assemble the results into the final plot in the nomad projection environment

python nomad_project/figures/arxiv.py --plot
python nomad_project/figures/imagenet.py --plot

The output will be stored in the results directory in the root of the repository.

PubMed

From the nomad projection environment, run the following command:

python nomad_project/figures/pubmed.py --nomad

The output will be stored in the results directory in the root of the repository.

Multilingual Wikipedia

python nomad_project/figures/wiki.py --nomad

The output will be stored in the results directory in the root of the repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages