This is custom scheduler for Kubernetes which uses machine learning model for making decisions. This scheduler is custom made and without modification is only able to schedule test appliaction. More about test app can be found here. Neural network model is made using TensorFlow and its repository can be found here. Other than using NN model for making decision it also has simple grading system for choosing most cost efficient solution.
Machine learning scheduler preditces how long it will take web application to return respone in seconds. Scheduler waits for all three pods of web application to be created. Once it collects all three parts it starts looking for combination of nodes that will give best cost effective combination. First its making predictions for all possible combinations of nodes based on resource usage. After it makes all possible prediction new list is create with all combinations that are within 10 percent of best combination. After that it grades based on:
- Lower resource usage on a node
- Type of storage After finding most cost effective combination it start binding pods to nodes.
Lab environment consisted of five nodes, one contorl node and four worker nodes. All of the nodes had different amount resources.
vCPU | Memory | Storage type | |
---|---|---|---|
Control node | 2 | 4 GB | HDD |
Node 1 | 4 | 4 GB | HDD |
Node 2 | 4 | 8 GB | SSD |
Node 3 | 2 | 4 GB | HDD |
Node 4 | 4 | 8 GB | SSD |
For running all Python scripts I used Python 3.11.
After running tests without and with workload on worker nodes, I find machine learning scheduler always gave better results. Results showed in the table below are averages of ten samples.
Predicted time | ML scheduler | MAE | Default scheduler | |
---|---|---|---|---|
No workload | 6.45 | 5.89 | 0.56 | 5.94 |
Workload on 1 node | 6.4 | 6.09 | 0.31 | 6.15 |
Workload on 2 node | 6.53 | 6.4 | 0.13 | 6.95 |
Workload on 3 node | 6.6 | 6.69 | 0.09 | 7.88 |
Workload on 4 node | 7.58 | 10.76 | 3.18 | 12.72 |