Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Commit: Minimal Valid Product of APP #1

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,14 @@
job*
*.out
__pycache__/
data/
*/__pycache__/
# data/
*/tmp*
generated_schemas
*/job.json
*/workplace
*whl
*/*cif
*/*zip
*egg-info
*/*egg-info
147 changes: 9 additions & 138 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,16 @@
_CrystalFormer_ is a transformer-based autoregressive model specifically designed for space group-controlled generation of crystalline materials. The space group symmetry significantly simplifies the
crystal space, which is crucial for data and compute efficient generative modeling of crystalline materials.

<div align="center">
<img align="middle" src="imgs/output.gif" width="400">
<h3> Generating Cs<sub>2</sub>ZnFe(CN)<sub>6</sub> Crystal (<a href=https://next-gen.materialsproject.org/materials/mp-570545>mp-570545</a>) </h3>
</div>

## Contents

- [Contents](#contents)
- [Model card](#model-card)
- [Get Started](#get-started)
- [Installation](#installation)
- [CPU installation](#cpu-installation)
- [CUDA (GPU) installation](#cuda-gpu-installation)
- [install required packages](#install-required-packages)
- [Available Weights](#available-weights)
- [How to run](#how-to-run)
- [train](#train)
- [sample](#sample)
- [evaluate](#evaluate)
- [How to cite](#how-to-cite)
## Installation


```bash
pip install .
```

Then command `crystalgpu-app` will create a gradio link.


## Model card

Expand All @@ -44,125 +34,7 @@ The model is an autoregressive transformer for the space group conditioned cryst

We only consider symmetry inequivalent atoms. The remaining atoms are restored based on the space group and Wyckoff letter information. Note that there is a natural alphabetical ordering for the Wyckoff letters, starting with 'a' for a position with the site-symmetry group of maximal order and ending with the highest letter for the general position. The sampling procedure starts from higher symmetry sites (with smaller multiplicities) and then goes on to lower symmetry ones (with larger multiplicities). Only for the cases where discrete Wyckoff letters can not fully determine the structure, one needs to further consider factional coordinates in the loss or sampling.

## Get Started

**Notebooks**: The quickest way to get started with _CrystalFormer_ is our notebooks in the Google Colab and Bohrium (Chinese version) platforms:

- CrystalFormer Quickstart [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IMQV6OQgIGORE8FmSTmZuC5KgQwGCnDx?usp=sharing) [![Open In Bohrium](https://cdn.dp.tech/bohrium/web/static/images/open-in-bohrium.svg)](https://nb.bohrium.dp.tech/detail/68177247598): GUI notebook demonstrating the conditional generation of crystalline materials with _CrystalFormer_;
- CrystalFormer Application [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QdkELaQXAHR1zEu2fcdfgabuoP61_wbU?usp=sharing): Generating stable crystals with a given structure prototype. This workflow can be applied to tasks that are dominated by element substitution.

## Installation

Create a new environment and install the required packages, we recommend using python `3.10.*` and conda to create the environment:

```bash
conda create -n crystalgpt python=3.10
conda activate crystalgpt
```

Before installing the required packages, you need to install `jax` and `jaxlib` first.

### CPU installation

```bash
pip install -U "jax[cpu]"
```

### CUDA (GPU) installation

If you intend to use CUDA (GPU) to speed up the training, it is important to install the appropriate version of `jax` and `jaxlib`. It is recommended to check the [jax docs](https://github.com/google/jax?tab=readme-ov-file#installation) for the installation guide. The basic installation command is given below:

```bash
pip install --upgrade pip

# CUDA 12 installation
# Note: wheels only available on linux.
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# CUDA 11 installation
# Note: wheels only available on linux.
pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

### install required packages

```bash
pip install -r requirements.txt
```

## Available Weights

We release the weights of the model trained on the MP-20 dataset. More details can be seen in the [model](./model/README.md) folder.

## How to run

### train

```bash
python ./src/main.py --folder ./data/ --train_path YOUR_PATH/mp_20/train.csv --valid_path YOUR_PATH/mp_20/val.csv
```

- `folder`: the folder to save the model and logs
- `train_path`: the path to the training dataset
- `valid_path`: the path to the validation dataset
- `test_path`: the path to the test dataset

### sample

```bash
python ./src/main.py --optimizer none --test_path YOUR_PATH/mp_20/test.csv --restore_path YOUR_MODEL_PATH --spacegroup 160 --num_samples 1000 --batchsize 1000 --temperature 1.0 --use_foriloop
```

- `optimizer`: the optimizer to use, `none` means no training, only sampling
- `restore_path`: the path to the model weights
- `spacegroup`: the space group number to sample
- `num_samples`: the number of samples to generate
- `batchsize`: the batch size for sampling
- `temperature`: the temperature for sampling
- `use_foriloop`: use `lax.fori_loop` to speed up the sampling

You can also use the `element` to sample the specific element. For example, `--element La Ni O` will sample the structure with La, Ni, and O atoms. The sampling results will be saved in the `output_LABEL.csv` file, where the `LABEL` is the space group number `g` specified in the command `--spacegroup`.

### evaluate

Before evaluating the generated structures, you need to transform the generated `g, W, A, X, L` to the `cif` format. You can use the following command to transform the generated structures to the `cif` format and save as the `csv` file:

```bash
python ./scripts/awl2struct.py --output_path YOUR_PATH --label SPACE_GROUP --num_io_process 40
```

- `output_path`: the path to read the generated `L, W, A, X` and save the `cif` files
- `label`: the label to save the `cif` files, which is the space group number `g`
- `num_io_process`: the number of processes

Calculate the structure and composition validity of the generated structures:

```bash
python ./scripts/compute_metrics.py --root_path YOUR_PATH --filename YOUR_FILE --num_io_process 40
```

- `root_path`: the path to the dataset
- `filename`: the filename of the generated structures
- `num_io_process`: the number of processes

Calculate the novelty and uniqueness of the generated structures:

```bash
python ./scripts/compute_metrics_matbench.py --train_path TRAIN_PATH --test_path TEST_PATH --gen_path GEN_PATH --output_path OUTPUT_PATH --label SPACE_GROUP --num_io_process 40
```

- `train_path`: the path to the training dataset
- `test_path`: the path to the test dataset
- `gen_path`: the path to the generated dataset
- `output_path`: the path to save the metrics results
- `label`: the label to save the metrics results, which is the space group number `g`
- `num_io_process`: the number of processes

Note that the training, test, and generated datasets should contain the structures within the **same** space group `g` which is specified in the command `--label`.

More details about the post-processing can be seen in the [scripts](./scripts/README.md) folder.

## How to cite

```bibtex
@misc{cao2024space,
Expand All @@ -175,4 +47,3 @@ More details about the post-processing can be seen in the [scripts](./scripts/RE
}
```

**Note**: This project is unrelated to https://github.com/omron-sinicx/crystalformer with the same name.
Empty file.
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
137 changes: 137 additions & 0 deletions build/lib/crystalformerapp/gr_frontend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
import gradio as gr
from gradio_materialviewer import MaterialViewer
import tempfile
import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from simple_op import run_crystalformer
from op import run_op, run_op_gpu

def main():
with tempfile.TemporaryDirectory(dir=".") as tempdir:
with gr.Blocks() as app:
with gr.Tab(label="Quick Start Mode"):
with gr.Row():
with gr.Column():
spacegroup = gr.Slider(label="Spacegroup", minimum=1, maximum=230, value=225, step=1)
elements = gr.Textbox(label="Elements", value="C")
with gr.Column():
temperature = gr.Slider(label="Temperature", minimum=0.1, maximum=2.0, value=1.0, step=0.1)
seed = gr.Number(label="Seed", value=42)

with gr.Row():
generate_btn = gr.Button("Generate Structure")
clear_btn = gr.Button("Clear Inputs")

output_file = gr.File(label="CIF File")
material_viewer = MaterialViewer(height=480, materialFile="", format='cif')

def generate_and_display_structure(sp, el, temp, sd):
cif_file_path = run_crystalformer(sp, el, temp, sd, tempdir)
with open(cif_file_path, 'r') as ff:
cif_content = "".join(ff.readlines())
return cif_file_path, MaterialViewer(materialFile=cif_content, format='cif', height=480)

generate_btn.click(
fn=generate_and_display_structure,
inputs=[spacegroup, elements, temperature, seed],
outputs=[output_file, material_viewer]
)

clear_btn.click(
fn=lambda: (225, "C", 1.0, 42),
inputs=None,
outputs=[spacegroup, elements, temperature, seed]
)

gr.Markdown("""
# Quick Start Mode

Generate crystal structures with Quick Start Mode.

## Instructions:
1. Enter the spacegroup number
2. Specify the elements (comma-separated)
3. Adjust the temperature
4. Set a random seed (optional)
5. Click 'Generate Structure' to create the CIF file
""")

with gr.Tab(label="Research Mode"):
with gr.Row():
with gr.Column():
spacegroup = gr.Slider(label="Spacegroup", minimum=1, maximum=230, value=225, step=1)
elements = gr.Textbox(label="Elements", value="C")
wyckoff = gr.Textbox(label="Wyckoff", value="a")
with gr.Column():
seed = gr.Number(label="Seed", value=42)
temperature = gr.Slider(label="Temperature", minimum=0.5, maximum=1.5, value=1.0, step=0.1)
T1 = gr.Slider(label="T1", minimum=100, maximum=100000000, value=100, step=100)
nsweeps = gr.Slider(label="nsweeps", minimum=0, maximum=20, value=10, step=1)
with gr.Row():
access_key = gr.Textbox(label="Access Key")
project_id = gr.Textbox(label="Project ID")
machine_type = gr.Dropdown(label="Machine Type", choices=[
"1 * NVIDIA T4_16g",
"1 * NVIDIA V100_32g",
"c12_m64_1 * NVIDIA L4",
])

with gr.Row():
generateWeb_btn = gr.Button("Generate Structure")
generateGPU_btn = gr.Button("Generate Structure on GPU machines")
clear_btn = gr.Button("Clear Inputs")

output_file = gr.File(label="CIF File")
material_viewer = MaterialViewer(height=480, materialFile="", format='cif')

def generate_and_display_structure_web(sp, el, wy, temp, sd, T1, ns):
cif_file_path = run_op(sp, el, wy, temp, sd, T1, ns, tempdir)
with open(cif_file_path, 'r') as ff:
cif_content = "".join(ff.readlines())
return cif_file_path, MaterialViewer(materialFile=cif_content, format='cif', height=480)

generateWeb_btn.click(
fn=generate_and_display_structure_web,
inputs=[spacegroup, elements, wyckoff, temperature, seed, T1, nsweeps],
outputs=[output_file, material_viewer]
)

def generate_and_display_structure_gpu(sp, el, wy, temp, sd, T1, ns, ak, pid, mt):
cif_file_path = run_op_gpu(sp, el, wy, temp, sd, T1, ns, ak, pid, mt, tempdir)
with open(cif_file_path, 'r') as ff:
cif_content = "".join(ff.readlines())
return cif_file_path, MaterialViewer(materialFile=cif_content, format='cif', height=480)

generateGPU_btn.click(
fn=generate_and_display_structure_gpu,
inputs=[spacegroup, elements, wyckoff, temperature, seed, T1, nsweeps, access_key, project_id, machine_type],
outputs=[output_file, material_viewer]
)

clear_btn.click(
fn=lambda: (225, "C", 1.0, 42),
inputs=None,
outputs=[spacegroup, elements, temperature, seed]
)

gr.Markdown("""
# Research Mode

Generate crystal structures with Research Mode.

## Instructions:
- **seed**: random seed to sample the crystal structure
- **spacegroup**: control the space group of generated crystals
- **temperature**: modifies the probability distribution
- **T1**: the temperature of sampling the first atom type
- **elements**: control the elements in the generating process. Note that you need to enter the elements separated by spaces, i.e., Ba Ti O, if the elements string is none, the model will not limit the elements
- **wyckoff**: control the Wyckoff letters in the generation. Note that you need to enter the Wyckoff letters separated by spaces, i.e., a c, if the Wyckoff is none, the model will not limit the wyckoff letter.
- **nsweeps**: control the steps of mcmc to refine the generated structures
""")

app.launch(share=True)

if __name__ == "__main__":
main()
48 changes: 48 additions & 0 deletions build/lib/crystalformerapp/launching.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import json
import shutil
from pathlib import Path
import os
import dp.launching.typing.addon.ui as ui
import traceback
from pprint import pprint

from dp.launching.cli import (SubParser, default_exception_handler,
run_sp_and_exit, to_runner)
from dp.launching.report import (AutoReportElement, MetricsChartReportElement,
Report, ReportSection)
from dp.launching.typing import (BaseModel, BohriumProjectId,
BohriumUsername, Boolean, Enum, Field, Float,
InputFilePath, Int, List, Literal,
Optional, OutputDirectory, String, DataSet, Union)
from dp.launching.typing.addon.sysmbol import Equal


class CrystalformerOptions(BaseModel):
spacegroup: Int = Field(ge=1, le=230, description="Space group number")
elements: String = Field(description="Elements to include, separated by spaces")
temperature: Float = Field(ge=0.5, le=1.5, default=1.0, description="Temperature for generation")
seed: Int = Field(default=42, description="Random seed")


def crystalformer_runner(opts: CrystalformerOptions) -> int:
try:
run_crystalformer(
spacegroup=opts.spacegroup,
elements=opts.elements,
temperature=opts.temperature,
seed=opts.seed
)
return 0
except Exception as exc:
print(str(exc))
traceback.print_exc()
return 1

def to_parser():
return {
"1": SubParser(CrystalformerOptions, crystalformer_runner, "Run Crystalformer")
}


if __name__ == '__main__':
run_sp_and_exit(to_parser(), description="Crystal Former", version="0.1.0", exception_handler=default_exception_handler)
File renamed without changes.
File renamed without changes.
Binary file added build/lib/crystalformerapp/model/epoch_009800.pkl
Binary file not shown.
Loading
Loading