diff --git a/README.md b/README.md index 1601b845..80fcafd9 100644 --- a/README.md +++ b/README.md @@ -61,19 +61,29 @@ Eos Website: https://eos.org/editor-highlights/machine-learning-for-geochemists ## Quick Installation +Our software is well tested on **macOS** and **Windows** system with **Python 3.9**. Other systems and Python version are not guranteed. + One instruction to download on **command line**, such as Terminal on macOS, Power Shell on Windows. ``` pip install geochemistrypi ``` +Download the latest version to avoid some old version issues, such as dependency downloading. +``` +pip install "geochemistrypi==0.5.0" +``` + One instruction to download on **Jupyter Notebook** or **Google Colab**. ``` !pip install geochemistrypi ``` - -Check the latest version of our software: +Download the latest version to avoid some old version issues, such as dependency downloading. +``` +!pip install "geochemistrypi==0.5.0" +``` +Check the downloaded version of our software: ``` geochemistrypi --version @@ -95,13 +105,52 @@ One instruction to download on **Jupyter Notebook** or **Google Colab**. !pip install --upgrade geochemistrypi ``` -Check the latest version of our software: +Check the updated version of our software: ``` geochemistrypi --version ``` -## Example +## Data Preparation + +In order to utilize the functions provided by our software, your own data set should satisfy: + +- be with the suffix **.xlsx** or **.csv**, which is supported by Microsoft Excel. +- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. It is optional. + +If you want to run **classification** algorithm, you data set should satisfy: + +- a label column. You can name it as you wish, such as **Label**. + +Column name specification: + +- No restriction on the column names. You can name them as you want except for two special and optional column **LATITUDE** and **LONGITUDE**. + +- every column can only one column name. Multi level column names are not allowed. + +- Between two columns with values, a completed void column can exists. + +The following are seven built-in data sets in our software stored on Google Drive and Tecent Docs, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset. + ++ Data_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a) + ++ ApplicationData_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1FCek2OOYQD887jfQz21g0ovqVuUJIjVoNI77D-Ufr9Y/edit?usp=sharing) | [[Tencent Docs]]( +https://docs.qq.com/document/DQ3BDeHhxRGNzSXZN) + ++ Data_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a) + ++ ApplicationData_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1J7QvdvbbHJMlKtiumBgKDW7ALghfQQZyKGEoOqhKQjw/edit?usp=sharing) | [[Tencent Docs]](https://docs.qq.com/document/DQ2dnQWtubHRBTGtB) + ++ Data_Clustering.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a) + ++ Data_Decomposition.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a) + ++ Data_AbnormalDetectioon.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1NqTQZCkv74Sn_iOJOKRc-QnJzpaWmnzC_lET_0ZreiQ/edit?usp=sharing) | [[Tencent Docs]]( +https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT) + +**Note**: For more detail on data preparation, please refer to our online documentation in **Model Example** under the section of **FOR USER**. + +## Running Example **How to run:** After successfully downloading, run this instruction on **command line / Jupyter Notebook / Google Colab** whatever directory it is. @@ -181,6 +230,12 @@ For more details: Please refer to: - MLflow UI user guide - Geochemistry π v0.5.0 [[Bilibili]](https://b23.tv/CW5Rjmo) | [[YouTube]](https://www.youtube.com/watch?v=Yu1nzNeLfRY) +The following screenshot shows the downloads and launching of our software on macOS: + +

+ Downloads and Launching on macOS +

+ ## Roadmap ### First Phase @@ -247,7 +302,6 @@ The whole package is under construction and the documentation is progressively e + Jianming Zhao (Jamie, Zhejiang University, China) + Jianhao Sun (Jin, China University of Geosciences, Wuhan, China) -+ Kaixin Zheng (Hayne, Sun Yat-sen University, China) + Yongkang Chan (Kill-virus, Lanzhou University, China) + Mengying Ye (Mary, Jilin University, China) + Mengqi Gao (China University of Geosciences, Beijing, China) @@ -261,6 +315,9 @@ The whole package is under construction and the documentation is progressively e + Yucheng Yan (Andy, University of Sydney, Australia) + Ruitao Chang (China University of Geosciences Beijing, China) + Junchi Liao(Roceda, University of Electronic Science and Technology of China, China) ++ Panyan Weng (The University of Sydney, Australia) ++ Siqi Yao (Clara, Dongguan University of Technology, China) ++ Zhelan Lin(Lan, Fuzhou University, China) ## Join Us :) @@ -327,6 +384,7 @@ More Videos will be recorded soon. + Shengxin Wang (Samson, Lanzhou University, China) + Wenyu Zhao (Molly, Zhejiang University, China) + Qiuhao Zhao (Brad, Zhejiang University, China) ++ Kaixin Zheng (Hayne, Sun Yat-sen University, China) + Anzhou Li (Andrian, Zhejiang University, China) + Dan Hu (Notre Dame University, United States) + Xunxin Liu (Tante, China University of Geosciences, Wuhan, China) diff --git a/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md b/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md index ef7be7bc..ea905770 100644 --- a/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md +++ b/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md @@ -9,26 +9,40 @@ Firstly you need to start the geochemistrypi programm via command line instrucit In order to utilize the functions provided by our software, your own data set should satisfy: -- be with the suffix **.xlsx**, which is supported by Microsoft Excel. -- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. +- be with the suffix **.xlsx** or **.csv**, which is supported by Microsoft Excel. +- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. It is optional. If you want to run **classification** algorithm, you data set should satisfy: -- Tag column **LABEL** to differentiate the data. +- a label column. You can name it as you wish, such as **Label**. -The following are four built-in data set in our software stored on Google Drive, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset. +Column name specification: -+ [Data_Regression.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) -+ [Data_Regression.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a) +- No restriction on the column names. You can name them as you want except for two special and optional column **LATITUDE** and **LONGITUDE**. -+ [Data_Classification.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) -+ [Data_Classification.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a) +- every column can only one column name. Multi level column names are not allowed. + +- Between two columns with values, a completed void column can exists. + +The following are seven built-in data sets in our software stored on Google Drive and Tecent Docs, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset. + ++ Data_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a) + ++ ApplicationData_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1FCek2OOYQD887jfQz21g0ovqVuUJIjVoNI77D-Ufr9Y/edit?usp=sharing) | [[Tencent Docs]]( +https://docs.qq.com/document/DQ3BDeHhxRGNzSXZN) + ++ Data_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a) + ++ ApplicationData_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1J7QvdvbbHJMlKtiumBgKDW7ALghfQQZyKGEoOqhKQjw/edit?usp=sharing) | [[Tencent Docs]](https://docs.qq.com/document/DQ2dnQWtubHRBTGtB) + ++ Data_Clustering.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a) + ++ Data_Decomposition.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a) + ++ Data_AbnormalDetectioon.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1NqTQZCkv74Sn_iOJOKRc-QnJzpaWmnzC_lET_0ZreiQ/edit?usp=sharing) | [[Tencent Docs]]( +https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT) -+ [Data_Clustering.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) -+ [Data_Clustering.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a) -+ [Data_Decomposition.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) -+ [Data_Decomposition.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a) #### Loading Data By running the start command, there will be a prompt if your dataset is successfully loaded: @@ -43,6 +57,7 @@ By running the start command, there will be a prompt if your dataset is successf 47 - U(PPM) -------------------- (Press Enter key to move forward.) + #### World Map Projection After successfully loading your data, you will be asked if you would like to plot a world map projection for a specific element: diff --git a/docs/source/Home/Introduction.md b/docs/source/Home/Introduction.md index a72f6395..62199014 100644 --- a/docs/source/Home/Introduction.md +++ b/docs/source/Home/Introduction.md @@ -62,19 +62,29 @@ Eos Website: https://eos.org/editor-highlights/machine-learning-for-geochemists ## Quick Installation +Our software is well tested on **macOS** and **Windows** system with **Python 3.9**. Other systems and Python version are not guranteed. + One instruction to download on **command line**, such as Terminal on macOS, Power Shell on Windows. ``` pip install geochemistrypi ``` +Download the latest version to avoid some old version issues, such as dependency downloading. +``` +pip install "geochemistrypi==0.5.0" +``` + One instruction to download on **Jupyter Notebook** or **Google Colab**. ``` !pip install geochemistrypi ``` - -Check the latest version of our software: +Download the latest version to avoid some old version issues, such as dependency downloading. +``` +!pip install "geochemistrypi==0.5.0" +``` +Check the downloaded version of our software: ``` geochemistrypi --version @@ -96,13 +106,52 @@ One instruction to download on **Jupyter Notebook** or **Google Colab**. !pip install --upgrade geochemistrypi ``` -Check the latest version of our software: +Check the updated version of our software: ``` geochemistrypi --version ``` -## Example +## Data Preparation + +In order to utilize the functions provided by our software, your own data set should satisfy: + +- be with the suffix **.xlsx** or **.csv**, which is supported by Microsoft Excel. +- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. It is optional. + +If you want to run **classification** algorithm, you data set should satisfy: + +- a label column. You can name it as you wish, such as **Label**. + +Column name specification: + +- No restriction on the column names. You can name them as you want except for two special and optional column **LATITUDE** and **LONGITUDE**. + +- every column can only one column name. Multi level column names are not allowed. + +- Between two columns with values, a completed void column can exists. + +The following are seven built-in data sets in our software stored on Google Drive and Tecent Docs, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset. + ++ Data_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a) + ++ ApplicationData_Regression.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1FCek2OOYQD887jfQz21g0ovqVuUJIjVoNI77D-Ufr9Y/edit?usp=sharing) | [[Tencent Docs]]( +https://docs.qq.com/document/DQ3BDeHhxRGNzSXZN) + ++ Data_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a) + ++ ApplicationData_Classification.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1J7QvdvbbHJMlKtiumBgKDW7ALghfQQZyKGEoOqhKQjw/edit?usp=sharing) | [[Tencent Docs]](https://docs.qq.com/document/DQ2dnQWtubHRBTGtB) + ++ Data_Clustering.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a) + ++ Data_Decomposition.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) | [[Tencent Docs]](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a) + ++ Data_AbnormalDetectioon.xlsx [[Google Drive]](https://docs.google.com/spreadsheets/d/1NqTQZCkv74Sn_iOJOKRc-QnJzpaWmnzC_lET_0ZreiQ/edit?usp=sharing) | [[Tencent Docs]]( +https://docs.qq.com/document/DQ2hqQ2N2ZGlOUWlT) + +**Note**: For more detail on data preparation, please refer to our online documentation in **Model Example** under the section of **FOR USER**. + +## Running Example **How to run:** After successfully downloading, run this instruction on **command line / Jupyter Notebook / Google Colab** whatever directory it is. @@ -176,10 +225,17 @@ Copy the URL shown on the console into any browser to open the MLflow web interf For more details: Please refer to: -+ [Manual v1.1.0 for Geochemistry π - Beta (International - Google drive)](https://drive.google.com/file/d/1yryykCyWKM-Sj88fOYbOba6QkB_fu2ws/view?usp=sharing) -+ [Manual v1.1.0 for Geochemistry π - Beta (China - Tencent Docs)](https://docs.qq.com/pdf/DQ0l5d2xVd2VwcnVW?&u=6868f96d4a384b309036e04e637e367a) -+ [Geochemistry π - Download and Run the Beta Version (International - Youtube)](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9) -+ [Geochemistry π - Download and Run the Beta Version (China - Bilibili)](https://www.bilibili.com/video/BV1UM4y1Q7Ju/?spm_id_from=333.999.0.0&vd_source=27944ab3b73a78970c1a52a5dcbb9140) +- Manual v1.1.0 for Geochemistry π - Beta [[Tencent Docs]](https://docs.qq.com/pdf/DQ0l5d2xVd2VwcnVW?&u=6868f96d4a384b309036e04e637e367a) | [[Google drive]](https://drive.google.com/file/d/1yryykCyWKM-Sj88fOYbOba6QkB_fu2ws/view?usp=sharing) + +- Geochemistry π - Download and Run the Beta Version [[Bilibili]](https://www.bilibili.com/video/BV1UM4y1Q7Ju/?spm_id_from=333.999.0.0&vd_source=27944ab3b73a78970c1a52a5dcbb9140) | [[YouTube]](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9) + +- MLflow UI user guide - Geochemistry π v0.5.0 [[Bilibili]](https://b23.tv/CW5Rjmo) | [[YouTube]](https://www.youtube.com/watch?v=Yu1nzNeLfRY) + +The following screenshot shows the downloads and launching of our software on macOS: + +

+ Downloads and Launching on macOS +

## Roadmap @@ -236,7 +292,6 @@ The whole package is under construction and the documentation is progressively e ![Geochemistry π.png](https://github.com/ZJUEarthData/geochemistrypi/assets/97781484/e77b1f11-41ab-4354-9064-6d62cc1bf1e4) - ## Team Info **Leader:** @@ -248,7 +303,6 @@ The whole package is under construction and the documentation is progressively e + Jianming Zhao (Jamie, Zhejiang University, China) + Jianhao Sun (Jin, China University of Geosciences, Wuhan, China) -+ Kaixin Zheng (Hayne, Sun Yat-sen University, China) + Yongkang Chan (Kill-virus, Lanzhou University, China) + Mengying Ye (Mary, Jilin University, China) + Mengqi Gao (China University of Geosciences, Beijing, China) @@ -262,6 +316,9 @@ The whole package is under construction and the documentation is progressively e + Yucheng Yan (Andy, University of Sydney, Australia) + Ruitao Chang (China University of Geosciences Beijing, China) + Junchi Liao(Roceda, University of Electronic Science and Technology of China, China) ++ Panyan Weng (The University of Sydney, Australia) ++ Siqi Yao (Clara, Dongguan University of Technology, China) ++ Zhelan Lin(Lan, Fuzhou University, China) ## Join Us :) @@ -328,6 +385,7 @@ More Videos will be recorded soon. + Shengxin Wang (Samson, Lanzhou University, China) + Wenyu Zhao (Molly, Zhejiang University, China) + Qiuhao Zhao (Brad, Zhejiang University, China) ++ Kaixin Zheng (Hayne, Sun Yat-sen University, China) + Anzhou Li (Andrian, Zhejiang University, China) + Dan Hu (Notre Dame University, United States) + Xunxin Liu (Tante, China University of Geosciences, Wuhan, China) diff --git a/geochemistrypi/data_mining/data/dataset/Data_AbnormalDetection.xlsx b/geochemistrypi/data_mining/data/dataset/Data_AbnormalDetection.xlsx new file mode 100644 index 00000000..d14ffac7 Binary files /dev/null and b/geochemistrypi/data_mining/data/dataset/Data_AbnormalDetection.xlsx differ