Area of AI based on algorithms capable of learning, extracting knowledge. Extract knowledge, they cannot create it. The idea is to build software that can make decisions on new unseen data. We use ML mainly in problems where its too difficult to define rules for the agents.
data
$\rightarrow$ experience
Desired outputs. The agent tries to reach them.
Just tries to find patterns/regularities in the data.
The agent receives rewards based on how it performs. The only goal of the agent is to maximise the long term rewards. The agent needs to find a balance between 'exploitation and exploration' -> there is the 'exploitation and exploration dilemma'.
There are many policies .. a greedy policy where the agent tries to perform the best rewarding action for each state. A
Also, the enviroment must satisfy the Markov Property:
History leads me here, but the next state and reward depends only on the current state/action . It's also important to design the right Reward Function.
Q-learning is a famous algorithm used by this class of learning agents. It's based on a Q table where the Q value of each state rapresent the 'reward' of the state.
The Q-table so it composed by all the possible states (rows) and for each state are considered all the possible actions (columns), then we fill each cell with the immediate reward of that action in that states. Later we perform a continue approximation of the reward of the states considering also the long term reward, using this formula: