diff --git a/README.md b/README.md index f01a76ff..c6523824 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ **Documentation**: https://geochemistrypi.readthedocs.io - **Source Code**: https://github.com/ZJUEarthData/geochemistrypi +**Source Code**: https://github.com/ZJUEarthData/geochemistrypi --- @@ -45,6 +45,14 @@ The following figure is the frontend-backend separation architecture of Geochemi
+ +**Cite the work as:** + +ZhangZhou J\*, He Can\*, Sun Jianhao, Zhao Jianming, Lyu Yang, Wang Shengxin, Zhao Wenyu, Li Anzhou, Ji Xiaohui. Geochemistry π: Automated machine learning python framework for tabular data (2024). Geochemistry, Geophysics, +Geosystems, 25, e2023GC011324 + +Download link: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023GC011324 + ## Quick Installation One instruction to download on **command line**, such as Terminal on macOS, Power Shell on Windows. @@ -227,7 +235,6 @@ The whole package is under construction and the documentation is progressively e + Jianming Zhao (Jamie, Zhejiang University, China) + Jianhao Sun (Jin, China University of Geosciences, Wuhan, China) + Kaixin Zheng (Hayne, Sun Yat-sen University, China) -+ Jianing Wang (National University of Singapore, Singapore) + Yongkang Chan (Kill-virus, Lanzhou University, China) + Mengying Ye (Mary, Jilin University, China) + Mengqi Gao (China University of Geosciences, Beijing, China) @@ -236,13 +243,10 @@ The whole package is under construction and the documentation is progressively e **Product Group**: + Yang Lyu (Daisy, Zhejiang University, China) -+ Wenyu Zhao (Molly, Zhejiang University, China) + Keran Li (Kirk, Chengdu University of Technology, China) -+ Aixiwake·Janganuer (Ayshuak, Sun Yat-sen University, China) + Bailun Jiang (EPSI / Lille University, France) + Yucheng Yan (Andy, University of Sydney, Australia) + Ruitao Chang (China University of Geosciences Beijing, China) -+ Zhenglin Xu (Garry, Jilin University, China) + Junchi Liao(Roceda, University of Electronic Science and Technology of China, China) ## Join Us :) @@ -308,6 +312,7 @@ More Videos will be recorded soon. ## Contributors + Shengxin Wang (Samson, Lanzhou University, China) ++ Wenyu Zhao (Molly, Zhejiang University, China) + Qiuhao Zhao (Brad, Zhejiang University, China) + Anzhou Li (Andrian, Zhejiang University, China) + Dan Hu (Notre Dame University, United States) @@ -316,3 +321,6 @@ More Videos will be recorded soon. + Xin Li (The University of Manchester, United Kingdom) + Ting Liu (Kira, Sun Yat-sen University, China) + Xirui Zhu (Rae, University of York, United Kingdom) ++ Aixiwake·Janganuer (Ayshuak, Sun Yat-sen University, China) ++ Zhenglin Xu (Garry, Jilin University, China) ++ Jianing Wang (National University of Singapore, Singapore) diff --git a/docs/source/For Developer/Docker Deployment.md b/docs/source/For Developer/Deployment/Docker Deployment.md similarity index 100% rename from docs/source/For Developer/Docker Deployment.md rename to docs/source/For Developer/Deployment/Docker Deployment.md diff --git a/docs/source/For Developer/Local Deployment.md b/docs/source/For Developer/Deployment/Local Deployment.md similarity index 100% rename from docs/source/For Developer/Local Deployment.md rename to docs/source/For Developer/Deployment/Local Deployment.md diff --git a/docs/source/For User/Contact us/Q&A.md b/docs/source/For User/Contact us/Q&A.md index 168c3409..118d3826 100644 --- a/docs/source/For User/Contact us/Q&A.md +++ b/docs/source/For User/Contact us/Q&A.md @@ -1,3 +1,4 @@ + # Frequently Asked Questions **For your reference, we have summarized some problems encountered and solved in the process of development and testing** @@ -33,6 +34,7 @@ The absolute path of any disk is fine, but the path cannot contain spaces, and i No, but the current process is a common data mining process, and we will write an abbreviated introduction afterwards. **Q6. I'm having trouble installing our software because the download speed for Ray/Fiona is too slow or failing. How should I resolve this issue?** + To resolve the issue of slow or failed downloads for Ray/Fiona during installation, you can use the pip command with the Tsinghua mirror source, which may improve download speeds. This applies to both Mac and Windows systems. Here's the command: ```bash @@ -52,4 +54,5 @@ Gitee Link: https://gitee.com/zju-earth-data/geochemistrypi This approach is also suitable for developers who want to test the latest updates. For more information, refer to the "Local Deployment" section under "For Developers" in the online documentation. + Reference video: [The Fastest Currently Feasible Installation Method in China—Installing from GitHub Using requirements.mp4](https://www.bilibili.com/video/BV1pM411V7iR/?spm_id_from=333.999.0.0&vd_source=350db2ec0e0c3ee7f424928a21e82674) ++ Installation Manual +
+ + #### Contents + + 1. [Preparation](#Preparation) + 2. [Download Geochemistry π](#Download-Geichemistry—π) + 3. [Solutions and Suggestions for Installation Failure](#Solutions) + + + ## 1. Preparation + + ### 1.1 Install Python Interpreter -A Python interpreter is a program that reads and executes Python code. When you write Python code in a text file with a `.py` extension, you can run that file using the Python interpreter. For example, use `python main.py` to executes the codes inside main.py file in command line or terminal. + + +A Python interpreter is a program that reads and executes Python code. When you write Python code in a text file with a `.py` extension, you can run that file using the Python interpreter. For example, use `python main.py` to executes the codes inside main.py file in command line or terminal. + + The normal ways to install Python interpreter: + + (1) If you are a Windows user, you can use Microsoft Store App to download directly by searching Python. + + (2) Refer to the download section in [Python official documentation](https://www.python.org). + + (3) If you are Chinese users, you can refer to this blog [Python Download - RUNOOB](https://www.runoob.com/python/python-install.html) to download too. -### 1.2 Install Conda -Conda allows you to easily install, update, and manage Python packages and dependencies. Usually, Conda is included the software Anaconda. Hence, by downloading Anaconda, you can install Conda too. + +### 1.2 Install Conda + + + +Conda allows you to easily install, update, and manage Python packages and dependencies. Usually, Conda is included the software Anaconda. Hence, by downloading Anaconda, you can install Conda too. + + The normal ways to install Anaconda: + + (1) Refer to the download section in [Anaconda website](https://www.anaconda.com). + + (2) If you are Chinese users, you can refer to [Anaconda Download - Zhihu](https://zhuanlan.zhihu.com/p/459601766) to download Anaconda using Tsinghua mirror source Anaconda. Also, if you are not familiar with Command Prompt (CMD) in Windows, you can reference to [Frequently Used Commands on Windows - Zhihu](https://zhuanlan.zhihu.com/p/67513308). + + + ## 2. Download Geochemistry π in Virtual Environment + + ### 2.1 Create A Virtual Environment + + Use Conda to manage virtual environments (recommended) : + + (1) Creates a virtual environment by installing the python interpreter, for example, to install a version 3.9 python interpreter, where `env_name` is the name of the created environment. To avoid version problems, it is better to use 3.9 version of python. + + On Mac Terminal: + + ``` + conda create -n vir_env_name python=3.9 + ``` + + On Windows Command Prompt: + + ``` + conda create -n vir_env_name python=3.9 + ``` +***\*Note:\**** + +1. In the command, 'vir_env_name' can be replaced by any name you want, like 'Geochemical_project'. Remember it's better not using Chinese characters. + +2. Please remember that our project are stable in 'python=3.9'. Other versions would make unkown mistakes. + +3. When meeting problems in the installation. Please check the python version first. + + + For the prompting information, input `y` to continue until the configuration is done. + + (2) Activate the created virtual environment. + + On Mac Terminal: + + ``` + conda activate vir_env_name + ``` + + On Windows Command Prompt: + + ``` + conda activate vir_env_name + ``` -For more useful Conda commands, please search online. + + +For more useful Conda commands, please search online. + + ### 2.2 Use pip to Download + + After the virtual environment is activated on your computer, you can follow the steps below to download our software: + + (1) Clear the cache packages: + + On Mac Terminal: + + ``` + pip cache purge + ``` + + On Windows Command Prompt: + + ``` + pip cache purge + ``` + + (2) Download our software: + + On Mac Terminal: + + ``` + pip install geochemistrypi + ``` + + On Windows Command Prompt: + + ``` + pip install geochemistrypi + ``` + + (3) Check the latest version of our software: + + On Mac Terminal: + + ``` + geochemistrypi --version + ``` + + On Windows Command Prompt: + + ``` + geochemistrypi --version + ``` -**Note**: Domestic direct installation may stop because of network speed problems in ray or Fiona package installation failure. You can reference the following video to resolve the problem. + + +***\*Note\****: Domestic direct installation may stop because of network speed problems in ray or Fiona package installation failure. You can reference the following video to resolve the problem. + + + [Possible Scenarios When Installing via pip Directly in China.mp4](https://www.bilibili.com/video/BV1Gs4y1d7Cm/?spm_id_from=333.999.0.0&vd_source=350db2ec0e0c3ee7f424928a21e82674) + + + ## 3. Solutions and Suggestions for Installation Failure -If you encounter errors while Installing the software, please refer to the **Q&A** section under **Contact Us** in the **FOR USER** of our online documentation. -If you are still unable to resolve the issue after consulting, you can visit the **Contact Us** section in our online documentation under **FOR USER**. There, you can report the error to our team. + +### 3.1 Use Tsinghua Mirror Source + + + +If you cannot download our software because Ray/Fiona downloads are too slow or just fail , you can use the pip with Tsinghua mirror source to re-download the package in terms of the specific error. + +``` + +pip install ray -i https://pypi.tuna.tsinghua.edu.cn/simple + +``` + ++ Reference video: [Solutions to Failures in Direct pip Installation in China.mp4](https://www.bilibili.com/video/BV1zg4y1j7bx/?spm_id_from=333.999.0.0&vd_source=350db2ec0e0c3ee7f424928a21e82674). + + + +### 3.2 Use 'pip install -r requirements/production.txt' + + + +Another way to download related dependecies is to clone the source code from GitHub or Gitee repository firstly: + + + +GitHub Link: https://github.com/ZJUEarthData/geochemistrypi + + + +Gitee Link: https://gitee.com/zju-earth-data/geochemistrypi + + + +After that, unpacking the source code file. Open Terminal on Mac or Command Promt on Window and navigate to the directory to the source code file, use the following command to download the dependency : + + + +``` + +pip install -r requirements/production.txt + +``` + + + +Or use the Tsinghua mirror source to download: + + + +``` + +pip install -r requirements/production.txt -i https://pypi.tuna.tsinghua.edu.cn/simple + +``` + + + ++ Reference video: [The Fastest Currently Feasible Installation Method in China—Installing from GitHub Using requirements.mp4](https://www.bilibili.com/video/BV1pM411V7iR/?spm_id_from=333.999.0.0&vd_source=350db2ec0e0c3ee7f424928a21e82674) + + + +***\*Note\****: Actually this method can be used for developers to test our latest updates. For more information, please refer to our online documentation in ***\*Local Deployment\**** under the section of ***\*FOR DEVELOPER\****. + + + +### 3.3 Report An Error to Our Team + + + +You can refer to our online documentation in ***\*Contact Us\**** under the section of ****FOR USER****. diff --git a/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md b/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md index 35cda903..f7fddc34 100644 --- a/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md +++ b/docs/source/For User/Model Example/Data_Preprocessing/Data Preprocessing.md @@ -3,6 +3,30 @@ When we are working on data-mining or machine learning projects, the quality of your results highly depends on the quality of input data. As a result, data cleaning and preprocessing becomes an important step to make sure your input data is neat and balanced. Normally, data scientists will spend a large portion of their working time on data cleaning. However, Geochemistrypi can conduct this process automatically for you, and you just need to follow some simple steps. Firstly you need to start the geochemistrypi programm via command line instrucitons. Please refer to **Quick Installation** and **Example** to know how to start geochemistrypi. And now we use a classification data file as a sample. +#### Data Schema + +In order to utilize the functions provided by our software, your own data set should satisfy: + +- be with the suffix **.xlsx**, which is supported by Microsoft Excel. +- be comprise of location information **LATITUDE** and **LONGITUDE**, two columns respectively. + +If you want to run **classification** algorithm, only supporting binary classification currently, you data set should satisfy: + +- Tag column **LABEL** to differentiate the data. + +The following are four built-in data set in our software stored on Google Drive, have a look on them. For the algorithm you intend to run, you can refer to the data format of the corresponding dataset. + ++ [Data_Regression.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/13MB4t_2PiZ90tTMJKw7HcBUi2sb3tXej/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) ++ [Data_Regression.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ3VmdWZCTGV3bmpM?&u=6868f96d4a384b309036e04e637e367a) + ++ [Data_Classification.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1xFBCYVmtZfuEAbeBljUlzqBjxVuLAt8x/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) ++ [Data_Classification.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ0JUaUFsZnRaZkNG?&u=6868f96d4a384b309036e04e637e367a) + ++ [Data_Clustering.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1sbuJdOzGNQ2Pk-bVURfPYg1rltyBbn5J/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) ++ [Data_Clustering.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ3dKdGtlWkhZS2xR?&u=6868f96d4a384b309036e04e637e367a) + ++ [Data_Decomposition.xlsx (International - Google drive)](https://docs.google.com/spreadsheets/d/1kix82qj5--vhnm8-KhuUBH9dqYH6zcY8/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) ++ [Data_Decomposition.xlsx (China - Tencent Docs)](https://docs.qq.com/document/DQ29oZ0lhUGtZUmdN?&u=6868f96d4a384b309036e04e637e367a) #### Loading Data By running the start command, there will be a prompt if your dataset is successfully loaded: @@ -140,7 +164,28 @@ Geochemistrypi will generate null value report for the selected dataset: dtype: float64 Note: you don't need to deal with the missing values, we'll just pass this step! (Press Enter key to move forward.) -Note that if there is missing value in the dataset, you have to choose a strategy to deal with missing values. +At this point, you can choose whether to deal with the missing values. + +``` +-*-*- Missing Values Process -*-*- +Do you want to deal with the missing values? +1 - Yes +2 - No +(Data) ➜ @Number: +``` + +When you choose to deal with the missing values, Geochemistrypi will provide two methods for processing. + +``` +-*-*- Strategy for Missing Values -*-*- +1 - Drop Rows with Missing Values +2 - Impute Missing Values +Notice: Drop the rows with missing values may lead to a significant loss of data if too many features are chosen. +Which strategy do you want to apply? +(Data) ➜ @Number: +``` + +If you choose Impute Missing Values, you have to select a strategy to deal with the missing values. ``` @@ -155,7 +200,7 @@ Successfully fill the missing values with the mean value of each feature column (Press Enter key to move forward.) ``` -####Feature Engineering +#### Feature Engineering You can also genereate new features from the selected dataset. In order to do this, you should state the name of generated column. Here we name our new column "new feature", and then you have to identify some operations to generate the new feature. we simply use `b * c + d` (each column corresponds to an alphbetical letter for convinience) the output is as follows: diff --git a/docs/source/Home/Introduction.md b/docs/source/Home/Introduction.md index 58edcca7..09375d70 100644 --- a/docs/source/Home/Introduction.md +++ b/docs/source/Home/Introduction.md @@ -6,42 +6,284 @@ -# Introduction -## What it is -Geochemistry π is **a Python framework** for data-driven geochemistry discovery. It provides an extendable tool and one-stop shop for **geochemical data analysis** on tabular data. The goal of the Geochemistry π is to create a series of user-friendly and extensible products of high automation for the full cycle of geochemistry research. +--- + +**Documentation**: https://geochemistrypi.readthedocs.io + +**Source Code**: https://github.com/ZJUEarthData/geochemistrypi + +--- + +Geochemistry π is an **open-sourced highly automated machine learning Python framework** dedicating to build up MLOps level 1 software product for data-driven geochemistry discovery on tabular data. + +Core capabilities are: + ++ **Continous Training** ++ **Machine Learning Lifecycle Management** ++ **Model Inference** Key features are: + + **Easy to use:** The automation of data mining process provides the users with simple number options to choose. -+ **Extensible:** It allows appending new algorithms through Scikit-learn with AutoML function by FLAML and Ray. ++ **Extensible:** It allows appending new algorithms through Scikit-learn with automatic hyper parameter searching by FLAML and Ray. ++ **Traceable**: It integrates MLflow to build special storage mechanism to streamline the end-to-end machine learning lifecycle. + +Latest Update: follow up by clicking `Starred` and `Watch` on our [GitHub repository](https://github.com/ZJUEarthData/geochemistrypi), then get email notifications of the newest features automatically. + +The following figure is the simplified overview of Geochemistry π:+ +
+ +The following figure is the frontend-backend separation architecture of Geochemistry:+ +
+ + +**Cite the work as:** + +ZhangZhou J\*, He Can\*, Sun Jianhao, Zhao Jianming, Lyu Yang, Wang Shengxin, Zhao Wenyu, Li Anzhou, Ji Xiaohui. Geochemistry π: Automated machine learning python framework for tabular data (2024). Geochemistry, Geophysics, +Geosystems, 25, e2023GC011324 + +Download link: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023GC011324 + +## Quick Installation + +One instruction to download on **command line**, such as Terminal on macOS, Power Shell on Windows. + +``` +pip install geochemistrypi +``` + +One instruction to download on **Jupyter Notebook** or **Google Colab**. + +``` +!pip install geochemistrypi +``` + +Check the latest version of our software: + +``` +geochemistrypi --version +``` + +**Note**: For more detail on installation, please refer to our online documentation in **Installation Manual** under the section of **FOR USER**. Over there, we highly recommend to use virtual environment (Conda) to avoid dependency version problems. + +## Quick Update + +One instruction to update the software to the latest version on **command line**, such as Terminal on macOS, Power Shell on Windows. + +``` +pip install --upgrade geochemistrypi +``` + +One instruction to download on **Jupyter Notebook** or **Google Colab**. + +``` +!pip install --upgrade geochemistrypi +``` + +Check the latest version of our software: + +``` +geochemistrypi --version +``` + +## Example + +**How to run:** After successfully downloading, run this instruction on **command line / Jupyter Notebook / Google Colab** whatever directory it is. + +### Case 1: Run with built-in data set for testing + +On command line: + +``` +geochemistrypi data-mining +``` + +On Jupyter Notebook / Google Colab: + +``` +!geochemistrypi data-mining +``` + +**Note**: There are four built-in data sets corresponding to four kinds of model pattern. + +### Case 2: Run with your own data set without model inference + +On command line: + +``` +geochemistrypi data-mining --data your_own_data_set.xlsx +``` + +On Jupyter Notebook / Google Colab: + +``` +!geochemistrypi data-mining --data your_own_data_set.xlsx +``` + +**Note**: Currently, `.xlsx` and `.csv` files are supported. Please specify the path your data file exists. For Google Colab, don't forget to upload your dataset first. + +### Case 3: Implement model inference on application data + +On command line: + +``` +geochemistrypi data-mining --training your_own_training_data.xlsx --application your_own_application_data.xlsx +``` +On Jupyter Notebook / Google Colab: + +``` +!geochemistrypi data-mining --training your_own_training_data.xlsx --application your_own_application_data.xlsx +``` + +**Note**: Please make sure the column names (data schema) in both training data file and application data file are the same. Because the operations you perform via our software on the training data will be record automatically and subsequently applied to the application data in the same order. + +The training data in our pipeline will be divided into the train set and test set used for training the ML model and evaluating the model's performance. The score includes two types. The first type is the scores from the prediction on the test set while the second type is cv scores from the cross validation on the train set. + +### Case 4: Activate MLflow web interface + +On command line: + +``` +geochemistrypi data-mining --mlflow +``` + +On Jupyter Notebook / Google Colab: + +``` +!geochemistrypi data-mining --mlflow +``` + +**Note**: Once you run our software, there are two folders (`geopi_output` and `geopi_tracking`) generated automatically. Make sure the directory where you execute using the above command should have the genereted file `geopi_tracking`. + +Copy the URL shown on the console into any browser to open the MLflow web interface. The URL is normally like this http://127.0.0.1:5000. Search MLflow online to see more operations and usages. + +For more details: Please refer to: + ++ [Manual v1.1.0 for Geochemistry π - Beta (International - Google drive)](https://drive.google.com/file/d/1yryykCyWKM-Sj88fOYbOba6QkB_fu2ws/view?usp=sharing) ++ [Manual v1.1.0 for Geochemistry π - Beta (China - Tencent Docs)](https://docs.qq.com/pdf/DQ0l5d2xVd2VwcnVW?&u=6868f96d4a384b309036e04e637e367a) ++ [Geochemistry π - Download and Run the Beta Version (International - Youtube)](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9) ++ [Geochemistry π - Download and Run the Beta Version (China - Bilibili)](https://www.bilibili.com/video/BV1UM4y1Q7Ju/?spm_id_from=333.999.0.0&vd_source=27944ab3b73a78970c1a52a5dcbb9140) + +## Roadmap + +### First Phase -## First Phase It works as a **software application** with a command-line interface (CLI) to automate **data mining** process with frequently-used **machine learning algorithms** and **statistical analysis methods**, which would further lower the threshold for the geochemists. -The highlight is that through choosing **simple number options**, the users are able to implement a completed cycle of data mining **without knowledge of** SciPy, NumPy, Pandas, Scikit-learn, FLAML, Ray packages. +The highlight is that through choosing **simple number options**, the users are able to implement a full cycle of data mining **without knowledge of** SciPy, NumPy, Pandas, Scikit-learn, FLAML, Ray packages. The following figure is the activity diagram of automated ML pipeline in Geochemistry π: -![Geochemistryπ-Activity Diagram_v1.png](https://github.com/ZJUEarthData/geochemistrypi/assets/66779478/012223e2-8c90-401d-b972-2d4ecd180d83) + Its data section provides feature engineering based on **arithmatic operation**. It allows the users to have a statistic analysis on the data set as well as on the imputation result, which is supported by the combination of **Monte Carlo simulation** and **hypothesis testing**. Its models section provides both **supervised learning** and **unsupervised learning** methods from **Scikit-learn** framework, including four types of algorithms, regression, classification, clustering, and dimensional reduction. Integrated with **FLAML** and **Ray** framework, it allows the users to run AutoML easily, fastly and cost-effectively on the built-in supervised learning algorithms in our framework. +The following figure is the hierarchical architecture of Geochemistry π: + ++ +
-## Second Phase +### Second Phase -Currently, we are building three access ways to provide more user-friendly service, including **web portal**, **CLI package** and **API**. It allows the user to perform **continuous training** of the model by automating the ML pipeline in different layers. +Currently, we are building three access ways to provide more user-friendly service, including **web portal**, **CLI package** and **API**. It allows the user to perform **continuous training** and **model inference** by automating the ML pipeline and **machine learning lifecycle management** by unique storage mechanism in different access layers. -The following figure is the system architecture diagram of Geochemistry π:+ +
+ +The following figure is the design pattern hierarchical architecture:+ +
The whole package is under construction and the documentation is progressively evolving. +## Team Info + +**Leader:** + ++ Can He (Sany, National University of Singapore, Singapore) + Email: sanyhew1097618435@163.com + +**Technical Group:** + ++ Jianming Zhao (Jamie, Zhejiang University, China) ++ Jianhao Sun (Jin, China University of Geosciences, Wuhan, China) ++ Kaixin Zheng (Hayne, Sun Yat-sen University, China) ++ Yongkang Chan (Kill-virus, Lanzhou University, China) ++ Mengying Ye (Mary, Jilin University, China) ++ Mengqi Gao (China University of Geosciences, Beijing, China) ++ Chengtu Li(Trenki, Henan Polytechnic University, Beijing, China) + +**Product Group**: + ++ Yang Lyu (Daisy, Zhejiang University, China) ++ Keran Li (Kirk, Chengdu University of Technology, China) ++ Bailun Jiang (EPSI / Lille University, France) ++ Yucheng Yan (Andy, University of Sydney, Australia) ++ Ruitao Chang (China University of Geosciences Beijing, China) ++ Junchi Liao(Roceda, University of Electronic Science and Technology of China, China) + +## Join Us :) + +**The recruitment of research interns is ongoing !!!** + +**Key Point: All things are done online, remote work (\*^▽^\*)** + +**What can you learn?** + ++ Learning the full cycle of data mining (Scikit-learn, Ray, Mlflow) on tabular data, including the algorithms in regression,classification, clustering, and decomposition. ++ Learning to be a qualified Python developer, including any Python programing contents towards data mining, basic software engineering techniques like frontend (React, Typescript, Ant Design scaffold) and backend (SQL & NoSQL database, RESFful API, FastAPI) development, and cooperation tools like Git. + +**What can you get?** + ++ Research internship proof and reference letter after working for >> 100 hours. ++ Chance to pay a visit to Hangzhou, China, sponsored by ZJU Earth Data. ++ Chance to be guided by the experts from IT companies in Silicon Valley and Hangzhou. ++ Bonus depending on your performance. + +**Current Working Pattern:** + ++ Online working and cooperation ++ Three weeks per working cycle -> One online meeting per working cycle ++ One cycle report (see below) per cycle - 5 mins to finish + +Even if you are not familiar with topics above, but if you are interested in and have plenty of time to do it. That's enough. We have a full-developed training system to help you, as a newbie of data mining or Python developer, learn steps by steps with seniors until you can make a significant contribution to our project. + +**More details about the project?** +Please refer to: +English Page: https://person.zju.edu.cn/en/zhangzhou +Chinese Page: https://person.zju.edu.cn/zhangzhou#0 + +**Do you want to contribute to this open-source program?** +Contact with your CV: sanyhew1097618435@163.com ## In-house Materials + Materials are in both Chinese and English. Others unshown below are internal materials. + 1. [Guideline Manual – Geochemistry π (International - Google drive)](https://docs.google.com/document/d/1LjwB5Lazk33E5vbtnFPJio_MyjYQxjEu/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) 2. [Guideline Manual – Geochemistry π (China - Tencent Docs)](https://docs.qq.com/doc/DQ21IZUdVQktqRWpm?&u=6868f96d4a384b309036e04e637e367a) 3. [Learning Steps for Newbies – Geochemistry π (International - Google drive)](https://docs.google.com/document/d/1GQO-SXwEx_8midr362pqfxNZtfUf-nA6/edit?usp=sharing&ouid=110717816678586054594&rtpof=true&sd=true) @@ -52,12 +294,30 @@ Materials are in both Chinese and English. Others unshown below are internal mat 8. [Cycle Report - Geochemistry π (China - Tencent Docs)](https://docs.qq.com/pdf/DQ25VSGNlbGx4UkFZ?&u=6868f96d4a384b309036e04e637e367a) ## In-house Videos + Technical record videos are on Bilibili and Youtube synchronously while other meeting videos are internal materials. More Videos will be recorded soon. + 1. [ZJU_Earth_Data Introduction (Geochemical Data, Python, Geochemistry π) - Prof. Zhang](https://www.bilibili.com/video/BV1Lf4y1w7EK?spm_id_from=333.999.0.0) 2. [How to Collaborate and Provide Bug Report on Geochemistry π Through GitHub - Can He (Sany)](https://www.youtube.com/watch?v=1DWoEsqsfvQ&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=3) 3. [Geochemistry π - Download and Run the Beta Version](https://www.youtube.com/watch?v=EeVaJ3H7_AU&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=9) 4. [How to Create and Use Virtual Environment on Geochemistry π - Can He (Sany)](https://www.youtube.com/watch?v=4KFi7OXxD-c&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=4) 5. [How to use Github-Desktop in conflict resolution - Qiuhao Zhao (Brad)](https://www.youtube.com/watch?v=KT1g5JpuUVI&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM) -6. [Virtual Environment & Packages On Windows - Jianming Zhao (Jamie)](https://www.youtube.com/watch?v=e4VqSBuNp_o&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=2) -7. [Git Workflow & Coordinating Synchronization - Jianming Zhao (Jamie)](https://www.bilibili.com/video/BV1Sa4y1f74k?spm_id_from=333.999.0.0&vd_source=9adcf2c5fdeffe1d11c89d441ef598ba) +6. [Virtual Environment & Packages On Windows - Jianming Zhao (Jamie)](https://www.youtube.com/watch?v=e4VqSBuNp_o&list=PLy8hNsI55lvh1UHjhVhqNUj3xPdV9sEiM&index=2) +7. [Git Workflow & Coordinating Synchronization - Jianming Zhao (Jamie)](https://www.bilibili.com/video/BV1Sa4y1f74k?spm_id_from=333.999.0.0&vd_source=9adcf2c5fdeffe1d11c89d441ef598ba) + +## Contributors + ++ Shengxin Wang (Samson, Lanzhou University, China) ++ Wenyu Zhao (Molly, Zhejiang University, China) ++ Qiuhao Zhao (Brad, Zhejiang University, China) ++ Anzhou Li (Andrian, Zhejiang University, China) ++ Dan Hu (Notre Dame University, United States) ++ Xunxin Liu (Tante, China University of Geosciences, Wuhan, China) ++ Fang Li (liv, Shenzhen University, China) ++ Xin Li (The University of Manchester, United Kingdom) ++ Ting Liu (Kira, Sun Yat-sen University, China) ++ Xirui Zhu (Rae, University of York, United Kingdom) ++ Aixiwake·Janganuer (Ayshuak, Sun Yat-sen University, China) ++ Zhenglin Xu (Garry, Jilin University, China) ++ Jianing Wang (National University of Singapore, Singapore) diff --git a/docs/source/deployment.rst b/docs/source/deployment.rst new file mode 100644 index 00000000..7a2eca06 --- /dev/null +++ b/docs/source/deployment.rst @@ -0,0 +1,8 @@ +Deployment +============= + +.. toctree:: + :maxdepth: 3 + + Local Deployment