Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: imporve meanshift-realted code and the common functions 'cluster center' and 'cluster label'. #378

Merged
merged 2 commits into from
Aug 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,18 +307,20 @@ The whole package is under construction and the documentation is progressively e
+ Mengqi Gao (China University of Geosciences, Beijing, China)
+ Chengtu Li(Trenki, Henan Polytechnic University, Beijing, China)
+ Yucheng Yan (Andy, University of Sydney, Australia)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Panyan Weng (The University of Sydney, Australia)

**Product Group**:

+ Yang Lyu (Daisy, Zhejiang University, China)
+ Bailun Jiang (EPSI / Lille University, France)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Panyan Weng (The University of Sydney, Australia)
+ Siqi Yao (Clara, Dongguan University of Technology, China)
+ Zhelan Lin(Lan, Fuzhou University, China)
+ ShuYi Li (Communication University Of China, Beijing, China)
+ Junbo Wang (China University Of Geosciences, Beijing, China)
+ Haibin Wang(Watson, University of Sydney, Australia)
+ Guoqiang Qiu(Elsen, Fuzhou University, China)
+ Yating Dong (Yetta,Dongguan University of Technology,China)
+ Haibin Lai (Michael, Southern University of Science and Technology, China)

## Join Us :)

Expand Down Expand Up @@ -398,3 +400,4 @@ More Videos will be recorded soon.
+ Zhenglin Xu (Garry, Jilin University, China)
+ Jianing Wang (National University of Singapore, Singapore)
+ Junchi Liao(Roceda, University of Electronic Science and Technology of China, China)
+ Bailun Jiang (EPSI / Lille University, France)
40 changes: 39 additions & 1 deletion docs/source/For Developer/Add New Model To Framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -865,8 +865,46 @@ Only for those algorithms, they belong to either regression or classification an

## 5. Test Model Workflow Class

After the model workflow class is added, you can test it through running the command `python start_cli_pipeline.py` on the terminal. If the test reports an error, you need to debug and fix it. If there is no error, it can be submitted.

After the model workflow class is added, you can test it through running the command `python start_cli_pipeline.py` on the terminal.

If you can successfully run the pipeline, there are three aspects to verify the correctness of your modification:

(1) Check whether the output info in the console is what you expect.

<img width="1347" alt="image" src="https://github.com/user-attachments/assets/6530bd18-d196-4829-997d-08222194a34f">

(2) Check whether the artifacts (e.g., dataset, images) produced saved properly in `geopi_output` folder and whether the content of the artifacts is what you expect. You can know where the `geopi_output` folder via the path in the console.

<img width="1400" alt="image" src="https://github.com/user-attachments/assets/773e5b61-c45e-4c18-8747-cd2753831f6b">

(3) Check whether the same artifacts (e.g., dataset, images) produced saved properly in MLflow. You can use this command `mlflow ui --backend-store-uri file:/path/to/geopi_tracking --port PORT_NUMBER` to launch the web interface supported by MLflow. Copy the link `http://127.0.0.1:PORT_NUMBER` to the brower. Click the corresponding experiment and run you created and check the artifacts accordingly.

<img width="1353" alt="image" src="https://github.com/user-attachments/assets/3ddda308-00e1-4a40-a392-91e0440a5d26">

<img width="1394" alt="image" src="https://github.com/user-attachments/assets/56c4d1b6-2458-4a93-9956-0993d3ffa058">

<img width="1288" alt="image" src="https://github.com/user-attachments/assets/e3ebebdb-2910-4826-a4dd-19a079be0b0d">

For more details on how to use MLflow, you can watch the video as below:

MLflow UI user guide - Geochemistry π v0.5.0 [[Bilibili]](https://b23.tv/CW5Rjmo) | [[YouTube]](https://www.youtube.com/watch?v=Yu1nzNeLfRY)

If you fail to run the pipeline, you need to debug and fix it. Here is a recommended way - **breakpoint debugging**. In VSCode, you need to open the file `start_cli_pipeline.py` and click the button VSCode provides.

<img width="1396" alt="image" src="https://github.com/user-attachments/assets/3eb2082b-1dca-48cd-9897-089355ff566a">

You can search the benefits of using **breakpoint debugging** to debug. There are two major benefits:

(1) Lookup the value of the variable in the stack frame in memory directly.

<img width="1396" alt="image" src="https://github.com/user-attachments/assets/91b45e99-1123-40bc-8190-0f982be695a8">

(2) Create temporary watch (code to debug) to evaluate in the current stack frame.

<img width="1397" alt="image" src="https://github.com/user-attachments/assets/232f59bd-e48d-40e9-9174-7ebe4e8d2fb2">

After fixing the problem, don't forget to verify the produced artifacts in three aspects.

## 6. Completed Pull Request

Expand Down
10 changes: 6 additions & 4 deletions docs/source/Home/Introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,18 +308,20 @@ The whole package is under construction and the documentation is progressively e
+ Mengqi Gao (China University of Geosciences, Beijing, China)
+ Chengtu Li(Trenki, Henan Polytechnic University, Beijing, China)
+ Yucheng Yan (Andy, University of Sydney, Australia)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Panyan Weng (The University of Sydney, Australia)

**Product Group**:

+ Yang Lyu (Daisy, Zhejiang University, China)
+ Bailun Jiang (EPSI / Lille University, France)
+ Ruitao Chang (China University of Geosciences Beijing, China)
+ Junchi Liao(Roceda, University of Electronic Science and Technology of China, China)
+ Panyan Weng (The University of Sydney, Australia)
+ Siqi Yao (Clara, Dongguan University of Technology, China)
+ Zhelan Lin(Lan, Fuzhou University, China)
+ ShuYi Li (Communication University Of China, Beijing, China)
+ Junbo Wang (China University Of Geosciences, Beijing, China)
+ Haibin Wang(Watson, University of Sydney, Australia)
+ Guoqiang Qiu(Elsen, Fuzhou University, China)
+ Yating Dong (Yetta,Dongguan University of Technology,China)
+ Haibin Lai (Michael, Southern University of Science and Technology, China)

## Join Us :)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
geochemistrypi.data\_mining.model.func.algo\_anomalydetection package
======================================================================
=====================================================================

Module contents
---------------
Expand Down
7 changes: 3 additions & 4 deletions geochemistrypi/data_mining/model/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,11 +377,10 @@ class ClusteringMetricsMixin:
"""Mixin class for clustering metrics."""

@staticmethod
def _get_num_clusters(func_name: str, algorithm_name: str, trained_model: object, store_path: str) -> None:
"""Get and log the number of clusters."""
labels = trained_model.labels_
num_clusters = len(np.unique(labels))
def _get_num_clusters(labels: pd.Series, func_name: str, algorithm_name: str, store_path: str) -> None:
"""Get and log the number of clusters. It is only used in those algorithms which don't allow to set the number of cluster in advance."""
print(f"-----* {func_name} *-----")
num_clusters = len(np.unique(labels.to_numpy()))
print(f"{func_name}: {num_clusters}")
num_clusters_dict = {f"{func_name}": num_clusters}
mlflow.log_metrics(num_clusters_dict)
Expand Down
Loading
Loading