This course covers the foundations of machine learning and shows some of the common applications to chemical engineering systems. Machine learning can be broadly classified into supervised learning, unsupervised learning and reinforcement learning. Additionally, it covers hybrid modeling, a very important aspect that deals with the combination of mechanistic knowledge with data-driven tools.
Machine learning (ML) is the field of study that gives computers the ability to learn without being explicitly programmed {cite}samuel1959some
.
This is in contrast to the "traditional" computer science on which exact instructions need to be specified in order to do a specific task.
ML heavily relies on linear algebra, statistics and optimization. Therefore, expect to encounter such topics while studying this course.
As mentioned before, ML can be broadly classified into 3 areas: supervised, unsupervised and reinforcement learning.
This refers to obtaining an input-output mapping where the learning agent is fed with examples in order to generalize to new instances.
For instance, assume there exist an unkown fuction
Depending on the form of the output
- Classification: when
$\textbf{y}$ is categorical (e.g., a molecule is toxic or not). - Regression: when
$\textbf{y}$ is continuous (e.g., the temperature profile of a reactor).
In supervised learning we are not particularly interested in fitting the observed data very well, but rather in generalizing well to unseen data! Therefore, the
concepts of **overfitting** and **underfitting** become really important.
:alt: supervised_learning
:width: 75%
:align: center
A collection of hypothesis functions (in blue) that could be fitted to the observed data (in red). Which hypothesis is the best? How can we determined the
best hypothesis function?
In the case of unsupervised learning the output values are not available, only the input data
:alt: unsupervised_learning
:width: 50%
:align: center
Clustering of data. How can we detect groups of data that are similar to each other? Why is this useful?
The name comes from animal psychology, where we train animals/pets by reinforcing good behaviour, and discouraging bad behaviours. Here, the agent has to learn how to interact with its environment in order to maximize the reward or minimize the punishment.
:alt: reinforcement_learning
:width: 50%
:align: center
Reinforcement learning: incentivate actions that maximize reward and/or discourage actions that lead to punishment.
The central questions about hybrid modeling are: should we discard all the physical knowledge acquired for centuries and replace it by data-driven models? Is there a way to combine both? Is it beneficial to do so?
In general, the term hybrid modeling refers to the combination of mechanistic and data-driven models and is also called "grey-box modeling". For example, mass and energy balances, thermodynamic laws and kinetics should be respected in our models. Introducing this physical knwoledge reduces the amount of data that is needed for the ML part and improves the capacity of the models to generalize to unseen conditions.
:alt: hybrid_modeling
:width: 75%
:align: center
Hybrid modeling is also refer in the literature as grey-box modeling.
:filter: docname in docnames