Skip to content

Commit

Permalink
feat: 1. adds new command to read the data from desktop. 2. formats t…
Browse files Browse the repository at this point in the history
…he console output. 3. updates the command info in docs.
  • Loading branch information
SanyHe committed Dec 27, 2024
1 parent 43e0ff4 commit 9b04cad
Show file tree
Hide file tree
Showing 15 changed files with 332 additions and 191 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ output.txt

# Web Dependency
node_modules
package-lock.json

# yarn v2
yarn.lock
Expand Down
89 changes: 67 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ pip install geochemistrypi

Download the latest version to avoid some old version issues, such as dependency downloading.
```
pip install "geochemistrypi==0.6.1"
pip install "geochemistrypi==0.7.0"
```

One instruction to download on **Jupyter Notebook** or **Google Colab**.
Expand All @@ -85,7 +85,7 @@ One instruction to download on **Jupyter Notebook** or **Google Colab**.
```
Download the latest version to avoid some old version issues, such as dependency downloading.
```
!pip install "geochemistrypi==0.6.1"
!pip install "geochemistrypi==0.7.0"
```
Check the downloaded version of our software:

Expand All @@ -95,6 +95,14 @@ geochemistrypi --version

**Note**: For more detail on installation, please refer to our online documentation in **Installation Manual** under the section of **FOR USER**. Over there, we highly recommend to use virtual environment (Conda) to avoid dependency version problems.


The following screenshot shows the downloads and launching of our software on macOS:

<p align="center">
<img src="https://github.com/user-attachments/assets/4fa0e2e7-20ad-4548-ab6c-ca5f26ba0106" alt="Downloads and Launching on macOS" width="450" />
</p>


## Quick Update

One instruction to update the software to the latest version on **command line**, such as Terminal on macOS, Power Shell on Windows.
Expand Down Expand Up @@ -156,9 +164,15 @@ https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT)

## Running Example

**How to run:** After successfully downloading, run this instruction on **command line / Jupyter Notebook / Google Colab** whatever directory it is.
**How to run:** After successfully downloading, run the instructions as the following examples shown on **command line / Jupyter Notebook / Google Colab**.

Once the software starts, there are two folders `geopi_output` and `geopi_tracking` generated automatically for result storage.

### Case 1: Run with built-in data set for testing
`geopi_tracking`: It is used by MLflow as the storage for visualized operations in the web interface, which users cannot modify directly.

`geopi_output`: It is a regular folder aligning with MLflow's storage structure, which users can operate.

### Case 1: Run with built-in data set for model training and model inference

On command line:

Expand All @@ -172,9 +186,34 @@ On Jupyter Notebook / Google Colab:
!geochemistrypi data-mining
```

**Note**: There are four built-in data sets corresponding to four kinds of model pattern.
**Note**:

+ There are five built-in data sets corresponding to five kinds of model pattern.

+ The generated output directory `geopi_output` and `geopi_tracking` will be on desktop by default.


### Case 2: Run with your own data set on desktop for model training and model inference

On command line:

```
geochemistrypi data-mining --desktop
```

On Jupyter Notebook / Google Colab:

```
!geochemistrypi data-mining --desktop
```

**Note**:

### Case 2: Run with your own data set without model inference
+ You need to create a directory `geopi_input` on desktop and put the date sets in it.

+ The generated output directory `geopi_output` and `geopi_tracking` will be on desktop by default.

### Case 3: Run with your own data set without model inference

On command line:

Expand All @@ -188,9 +227,13 @@ On Jupyter Notebook / Google Colab:
!geochemistrypi data-mining --data your_own_data_set.xlsx
```

**Note**: Currently, `.xlsx` and `.csv` files are supported. Please specify the path your data file exists. For Google Colab, don't forget to upload your dataset first.
**Note**:

+ Currently, `.xlsx` and `.csv` files are supported. Please specify the path your data file exists. For Google Colab, don't forget to upload your dataset first.

### Case 3: Implement model inference on application data
+ The generated output directory `geopi_output` and `geopi_tracking` will be on the directory where you run this command.

### Case 4: Implement model inference on application data

On command line:

Expand All @@ -204,11 +247,15 @@ On Jupyter Notebook / Google Colab:
!geochemistrypi data-mining --training your_own_training_data.xlsx --application your_own_application_data.xlsx
```

**Note**: Please make sure the column names (data schema) in both training data file and application data file are the same. Because the operations you perform via our software on the training data will be record automatically and subsequently applied to the application data in the same order.
**Note**:

+ Please make sure the column names (data schema) in both training data file and application data file are the same. Because the operations you perform via our software on the training data will be record automatically and subsequently applied to the application data in the same order.

+ The training data in our pipeline will be divided into the train set and test set used for training the ML model and evaluating the model's performance. The score includes two types. The first type is the scores from the prediction on the test set while the second type is cv scores from the cross validation on the train set.

The training data in our pipeline will be divided into the train set and test set used for training the ML model and evaluating the model's performance. The score includes two types. The first type is the scores from the prediction on the test set while the second type is cv scores from the cross validation on the train set.
+ The generated output directory 'geopi_output' and 'geopi_tracking' will be on the directory where you run this command.

### Case 4: Activate MLflow web interface
### Case 5: Activate MLflow web interface

On command line:

Expand All @@ -222,21 +269,18 @@ On Jupyter Notebook / Google Colab:
!geochemistrypi data-mining --mlflow
```

**Note**: Once you run our software, there are two folders (`geopi_output` and `geopi_tracking`) generated automatically. Make sure the directory where you execute using the above command should have the genereted file `geopi_tracking`.
**Note**:

Copy the URL shown on the console into any browser to open the MLflow web interface. The URL is normally like this http://127.0.0.1:5000. Search MLflow online to see more operations and usages.
+ Once the command is executed, our software will search `geopi_tracking` directory from the current working directory. If it doesn't exist, then our software will search it on desktop.

+ Copy the URL shown on the console into any browser to open the MLflow web interface. The URL is normally like this http://127.0.0.1:5000. Search MLflow online to see more operations and usages.

For more details: Please refer to:

- Geochemistry π - Download and Run the Beta Version [[Bilibili]](https://www.bilibili.com/video/BV1UM4y1Q7Ju/?spm_id_from=333.999.0.0&vd_source=27944ab3b73a78970c1a52a5dcbb9140) | [[YouTube]](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9)

- MLflow UI user guide - Geochemistry π v0.5.0 [[Bilibili]](https://b23.tv/CW5Rjmo) | [[YouTube]](https://www.youtube.com/watch?v=Yu1nzNeLfRY)

The following screenshot shows the downloads and launching of our software on macOS:

<p align="center">
<img src="https://github.com/user-attachments/assets/4fa0e2e7-20ad-4548-ab6c-ca5f26ba0106" alt="Downloads and Launching on macOS" width="450" />
</p>

## Roadmap

Expand Down Expand Up @@ -315,21 +359,20 @@ The whole package is under construction and the documentation is progressively e
+ Jianhao Sun (Jin, Nanjing University, China)
+ Mengying Ye (Mary, Jilin University, China)
+ Chengtu Li(Trenki, Henan Polytechnic University, Beijing, China)
+ Yucheng Yan (Andy, University of Sydney, Australia)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Panyan Weng (The University of Sydney, Australia)
+ Haibin Lai (Michael, Southern University of Science and Technology, China)
+ Siqi Yao (Clara, Dongguan University of Technology, China)

**Product Group**:

+ Siqi Yao (Clara, Dongguan University of Technology, China)
+ Zhelan Lin(Lan, Fuzhou University, China)
+ ShuYi Li (Communication University Of China, Beijing, China)
+ Junbo Wang (China University of Geosciences, Beijing, China)
+ Haibin Wang(Watson, University of Sydney, Australia)
+ Guoqiang Qiu(Elsen, Fuzhou University, China)
+ Yating Dong (Yetta,Dongguan University of Technology,China)
+ Haibin Lai (Michael, Southern University of Science and Technology, China)
+ Bailun Jiang (EPSI / Lille University, France)
+ Chufan Zhou (Yoko, Institute of Geochemistry, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China)

## Join Us :)

Expand Down Expand Up @@ -398,6 +441,8 @@ More Videos will be recorded soon.
+ Wenyu Zhao (Molly, Zhejiang University, China)
+ Qiuhao Zhao (Brad, Zhejiang University, China)
+ Kaixin Zheng (Hayne, Sun Yat-sen University, China)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Yucheng Yan (Andy, University of Sydney, Australia)
+ Anzhou Li (Andrian, Zhejiang University, China)
+ Keran Li (Kirk, Chengdu University of Technology, China)
+ Dan Hu (Notre Dame University, United States)
Expand Down
Loading

0 comments on commit 9b04cad

Please sign in to comment.