Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: add the built-in dataset for abnormal detection and update the docs. #326

Merged
merged 1 commit into from
Mar 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 63 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,29 @@ Eos Website: https://eos.org/editor-highlights/machine-learning-for-geochemists

## Quick Installation

Our software is well tested on **macOS** and **Windows** system with **Python 3.9**. Other systems and Python version are not guranteed.

One instruction to download on **command line**, such as Terminal on macOS, Power Shell on Windows.

```
pip install geochemistrypi
```

Download the latest version to avoid some old version issues, such as dependency downloading.
```
pip install "geochemistrypi==0.5.0"
```

One instruction to download on **Jupyter Notebook** or **Google Colab**.

```
!pip install geochemistrypi
```

Check the latest version of our software:
Download the latest version to avoid some old version issues, such as dependency downloading.
```
!pip install "geochemistrypi==0.5.0"
```
Check the downloaded version of our software:

```
geochemistrypi --version
Expand All @@ -95,13 +105,52 @@ One instruction to download on **Jupyter Notebook** or **Google Colab**.
!pip install --upgrade geochemistrypi
```

Check the latest version of our software:
Check the updated version of our software:

```
geochemistrypi --version
```

## Example
## Data Preparation

In order to utilize the functions provided by our software, your own data set should satisfy:

- be with the suffix **.xlsx** or **.csv**, which is supported by Microsoft Excel.
- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. It is optional.

If you want to run **classification** algorithm, you data set should satisfy:

- a label column. You can name it as you wish, such as **Label**.

Column name specification:

- No restriction on the column names. You can name them as you want except for two special and optional column **LATITUDE** and **LONGITUDE**.

- every column can only one column name. Multi level column names are not allowed.

- Between two columns with values, a completed void column can exists.

The following are seven built-in data sets in our software stored on Google Drive and Tecent Docs, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset.

+ Data_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a)

+ ApplicationData_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1FCek2OOYQD887jfQz21g0ovqVuUJIjVoNI77D-Ufr9Y/edit?usp=sharing) | [[Tencent Docs]](
https://docs.qq.com/document/DQ3BDeHhxRGNzSXZN)

+ Data_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a)

+ ApplicationData_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1J7QvdvbbHJMlKtiumBgKDW7ALghfQQZyKGEoOqhKQjw/edit?usp=sharing) | [[Tencent Docs]](https://docs.qq.com/document/DQ2dnQWtubHRBTGtB)

+ Data_Clustering.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a)

+ Data_Decomposition.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a)

+ Data_AbnormalDetectioon.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1NqTQZCkv74Sn_iOJOKRc-QnJzpaWmnzC_lET_0ZreiQ/edit?usp=sharing) | [[Tencent Docs]](
https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT)

**Note**: For more detail on data preparation, please refer to our online documentation in **Model Example** under the section of **FOR USER**.

## Running Example

**How to run:** After successfully downloading, run this instruction on **command line / Jupyter Notebook / Google Colab** whatever directory it is.

Expand Down Expand Up @@ -181,6 +230,12 @@ For more details: Please refer to:

- MLflow UI user guide - Geochemistry π v0.5.0 [[Bilibili]](https://b23.tv/CW5Rjmo) | [[YouTube]](https://www.youtube.com/watch?v=Yu1nzNeLfRY)

The following screenshot shows the downloads and launching of our software on macOS:

<p align="center">
<img src="https://github.com/ZJUEarthData/geochemistrypi/assets/47497750/70728795-59b7-4741-ab5b-9e63d284ad37" alt="Downloads and Launching on macOS" width="450" />
</p>

## Roadmap

### First Phase
Expand Down Expand Up @@ -247,7 +302,6 @@ The whole package is under construction and the documentation is progressively e

+ Jianming Zhao (Jamie, Zhejiang University, China)
+ Jianhao Sun (Jin, China University of Geosciences, Wuhan, China)
+ Kaixin Zheng (Hayne, Sun Yat-sen University, China)
+ Yongkang Chan (Kill-virus, Lanzhou University, China)
+ Mengying Ye (Mary, Jilin University, China)
+ Mengqi Gao (China University of Geosciences, Beijing, China)
Expand All @@ -261,6 +315,9 @@ The whole package is under construction and the documentation is progressively e
+ Yucheng Yan (Andy, University of Sydney, Australia)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Junchi Liao(Roceda, University of Electronic Science and Technology of China, China)
+ Panyan Weng (The University of Sydney, Australia)
+ Siqi Yao (Clara, Dongguan University of Technology, China)
+ Zhelan Lin(Lan, Fuzhou University, China)

## Join Us :)

Expand Down Expand Up @@ -327,6 +384,7 @@ More Videos will be recorded soon.
+ Shengxin Wang (Samson, Lanzhou University, China)
+ Wenyu Zhao (Molly, Zhejiang University, China)
+ Qiuhao Zhao (Brad, Zhejiang University, China)
+ Kaixin Zheng (Hayne, Sun Yat-sen University, China)
+ Anzhou Li (Andrian, Zhejiang University, China)
+ Dan Hu (Notre Dame University, United States)
+ Xunxin Liu (Tante, China University of Geosciences, Wuhan, China)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,40 @@ Firstly you need to start the geochemistrypi programm via command line instrucit

In order to utilize the functions provided by our software, your own data set should satisfy:

- be with the suffix **.xlsx**, which is supported by Microsoft Excel.
- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively.
- be with the suffix **.xlsx** or **.csv**, which is supported by Microsoft Excel.
- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. It is optional.

If you want to run **classification** algorithm, you data set should satisfy:

- Tag column **LABEL** to differentiate the data.
- a label column. You can name it as you wish, such as **Label**.

The following are four built-in data set in our software stored on Google Drive, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset.
Column name specification:

+ [Data_Regression.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true)
+ [Data_Regression.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a)
- No restriction on the column names. You can name them as you want except for two special and optional column **LATITUDE** and **LONGITUDE**.

+ [Data_Classification.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true)
+ [Data_Classification.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a)
- every column can only one column name. Multi level column names are not allowed.

- Between two columns with values, a completed void column can exists.

The following are seven built-in data sets in our software stored on Google Drive and Tecent Docs, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset.

+ Data_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a)

+ ApplicationData_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1FCek2OOYQD887jfQz21g0ovqVuUJIjVoNI77D-Ufr9Y/edit?usp=sharing) | [[Tencent Docs]](
https://docs.qq.com/document/DQ3BDeHhxRGNzSXZN)

+ Data_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a)

+ ApplicationData_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1J7QvdvbbHJMlKtiumBgKDW7ALghfQQZyKGEoOqhKQjw/edit?usp=sharing) | [[Tencent Docs]](https://docs.qq.com/document/DQ2dnQWtubHRBTGtB)

+ Data_Clustering.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a)

+ Data_Decomposition.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a)

+ Data_AbnormalDetectioon.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1NqTQZCkv74Sn_iOJOKRc-QnJzpaWmnzC_lET_0ZreiQ/edit?usp=sharing) | [[Tencent Docs]](
https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT)

+ [Data_Clustering.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true)
+ [Data_Clustering.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a)

+ [Data_Decomposition.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true)
+ [Data_Decomposition.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a)
#### Loading Data

By running the start command, there will be a prompt if your dataset is successfully loaded:
Expand All @@ -43,6 +57,7 @@ By running the start command, there will be a prompt if your dataset is successf
47 - U(PPM)
--------------------
(Press Enter key to move forward.)

#### World Map Projection

After successfully loading your data, you will be asked if you would like to plot a world map projection for a specific element:
Expand Down
78 changes: 68 additions & 10 deletions docs/source/Home/Introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,19 +62,29 @@ Eos Website: https://eos.org/editor-highlights/machine-learning-for-geochemists

## Quick Installation

Our software is well tested on **macOS** and **Windows** system with **Python 3.9**. Other systems and Python version are not guranteed.

One instruction to download on **command line**, such as Terminal on macOS, Power Shell on Windows.

```
pip install geochemistrypi
```

Download the latest version to avoid some old version issues, such as dependency downloading.
```
pip install "geochemistrypi==0.5.0"
```

One instruction to download on **Jupyter Notebook** or **Google Colab**.

```
!pip install geochemistrypi
```

Check the latest version of our software:
Download the latest version to avoid some old version issues, such as dependency downloading.
```
!pip install "geochemistrypi==0.5.0"
```
Check the downloaded version of our software:

```
geochemistrypi --version
Expand All @@ -96,13 +106,52 @@ One instruction to download on **Jupyter Notebook** or **Google Colab**.
!pip install --upgrade geochemistrypi
```

Check the latest version of our software:
Check the updated version of our software:

```
geochemistrypi --version
```

## Example
## Data Preparation

In order to utilize the functions provided by our software, your own data set should satisfy:

- be with the suffix **.xlsx** or **.csv**, which is supported by Microsoft Excel.
- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. It is optional.

If you want to run **classification** algorithm, you data set should satisfy:

- a label column. You can name it as you wish, such as **Label**.

Column name specification:

- No restriction on the column names. You can name them as you want except for two special and optional column **LATITUDE** and **LONGITUDE**.

- every column can only one column name. Multi level column names are not allowed.

- Between two columns with values, a completed void column can exists.

The following are seven built-in data sets in our software stored on Google Drive and Tecent Docs, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset.

+ Data_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a)

+ ApplicationData_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1FCek2OOYQD887jfQz21g0ovqVuUJIjVoNI77D-Ufr9Y/edit?usp=sharing) | [[Tencent Docs]](
https://docs.qq.com/document/DQ3BDeHhxRGNzSXZN)

+ Data_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a)

+ ApplicationData_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1J7QvdvbbHJMlKtiumBgKDW7ALghfQQZyKGEoOqhKQjw/edit?usp=sharing) | [[Tencent Docs]](https://docs.qq.com/document/DQ2dnQWtubHRBTGtB)

+ Data_Clustering.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a)

+ Data_Decomposition.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a)

+ Data_AbnormalDetectioon.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1NqTQZCkv74Sn_iOJOKRc-QnJzpaWmnzC_lET_0ZreiQ/edit?usp=sharing) | [[Tencent Docs]](
https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT)

**Note**: For more detail on data preparation, please refer to our online documentation in **Model Example** under the section of **FOR USER**.

## Running Example

**How to run:** After successfully downloading, run this instruction on **command line / Jupyter Notebook / Google Colab** whatever directory it is.

Expand Down Expand Up @@ -176,10 +225,17 @@ Copy the URL shown on the console into any browser to open the MLflow web interf

For more details: Please refer to:

+ [Manual v1.1.0 for Geochemistry π - Beta (International - Google drive)](https://drive.google.com/file/d/1yryykCyWKM-Sj88fOYbOba6QkB_fu2ws/view?usp=sharing)
+ [Manual v1.1.0 for Geochemistry π - Beta (China - Tencent Docs)](https://docs.qq.com/pdf/DQ0l5d2xVd2VwcnVW?&u=6868f96d4a384b309036e04e637e367a)
+ [Geochemistry π - Download and Run the Beta Version (International - Youtube)](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9)
+ [Geochemistry π - Download and Run the Beta Version (China - Bilibili)](https://www.bilibili.com/video/BV1UM4y1Q7Ju/?spm_id_from=333.999.0.0&vd_source=27944ab3b73a78970c1a52a5dcbb9140)
- Manual v1.1.0 for Geochemistry π - Beta [[Tencent Docs]](https://docs.qq.com/pdf/DQ0l5d2xVd2VwcnVW?&u=6868f96d4a384b309036e04e637e367a) | [[Google drive]](https://drive.google.com/file/d/1yryykCyWKM-Sj88fOYbOba6QkB_fu2ws/view?usp=sharing)

- Geochemistry π - Download and Run the Beta Version [[Bilibili]](https://www.bilibili.com/video/BV1UM4y1Q7Ju/?spm_id_from=333.999.0.0&vd_source=27944ab3b73a78970c1a52a5dcbb9140) | [[YouTube]](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9)

- MLflow UI user guide - Geochemistry π v0.5.0 [[Bilibili]](https://b23.tv/CW5Rjmo) | [[YouTube]](https://www.youtube.com/watch?v=Yu1nzNeLfRY)

The following screenshot shows the downloads and launching of our software on macOS:

<p align="center">
<img src="https://github.com/ZJUEarthData/geochemistrypi/assets/47497750/70728795-59b7-4741-ab5b-9e63d284ad37" alt="Downloads and Launching on macOS" width="450" />
</p>

## Roadmap

Expand Down Expand Up @@ -236,7 +292,6 @@ The whole package is under construction and the documentation is progressively e

![Geochemistry π.png](https://github.com/ZJUEarthData/geochemistrypi/assets/97781484/e77b1f11-41ab-4354-9064-6d62cc1bf1e4)


## Team Info

**Leader:**
Expand All @@ -248,7 +303,6 @@ The whole package is under construction and the documentation is progressively e

+ Jianming Zhao (Jamie, Zhejiang University, China)
+ Jianhao Sun (Jin, China University of Geosciences, Wuhan, China)
+ Kaixin Zheng (Hayne, Sun Yat-sen University, China)
+ Yongkang Chan (Kill-virus, Lanzhou University, China)
+ Mengying Ye (Mary, Jilin University, China)
+ Mengqi Gao (China University of Geosciences, Beijing, China)
Expand All @@ -262,6 +316,9 @@ The whole package is under construction and the documentation is progressively e
+ Yucheng Yan (Andy, University of Sydney, Australia)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Junchi Liao(Roceda, University of Electronic Science and Technology of China, China)
+ Panyan Weng (The University of Sydney, Australia)
+ Siqi Yao (Clara, Dongguan University of Technology, China)
+ Zhelan Lin(Lan, Fuzhou University, China)

## Join Us :)

Expand Down Expand Up @@ -328,6 +385,7 @@ More Videos will be recorded soon.
+ Shengxin Wang (Samson, Lanzhou University, China)
+ Wenyu Zhao (Molly, Zhejiang University, China)
+ Qiuhao Zhao (Brad, Zhejiang University, China)
+ Kaixin Zheng (Hayne, Sun Yat-sen University, China)
+ Anzhou Li (Andrian, Zhejiang University, China)
+ Dan Hu (Notre Dame University, United States)
+ Xunxin Liu (Tante, China University of Geosciences, Wuhan, China)
Expand Down
Binary file not shown.
Loading