One of the fundamental feature of the animal behavior is the stability of the internal state of the body. This feature, called "Homeostasis" (Feedback mechanism) or "Allostasis" (Model-Predictive mechanism), have been received attentions since the early stage og the artificial intelligence. One of the origin of these insight in the artificial intelligence context would be the Ashby's Homeostat and the concept of the "Ultra-stable system". He rigorously treated the problem of survival in the general environment as a problem of the regulation of the interoceptive control through the behavior selection. This is a concept of the classical cybernetics.
(From "Design for a Brain: The Origin of Adaptive Behavior" by Ashby)
Because of the universality and the generality of the problem, the optimal control of the internal state of the agent offers the theoretically-grounded treatment of the survival of the "natural agent" (animals). Fortunately, Dawkins says the survival of individuals can be seen as an approximation of the objective of the animals in The Selfish Gene. Recent progress of the theoretical neuroscience is starting to discuss a "regulator" perspective of the animal behavior, integrating the Bayesian view of the control problem.
(From "The Cybernetic Bayesian Brain" by Seth)
Animals regulate multiple resources through behavior control. This feature is treated in the field of the food selection research or the nutrient selection. This behavior can be observed in insects too. Researchers of the theoretical animal behavior suggested the "two-resource problem" as a simplest but concrete form of the nutrient selection problem (image below). The agent has sensor for nutrient detection, interoceptive nutrient level sensors, and manually implemented high-level foraging behaviors for two nutrient resources.
(Two-resource problem overview, From "Basic Cycles, Utility and Opportunism in Self-Suffcient Robots" by McFarland & Spier)
In this project of my research, I treated the problem of the homeostasis and the survival as the stochastic optimal control problem. The agent receives interoceptive signals from the body (red & blue resource levels) and two-steps RGB vision inputs. Agent has only primitive actions like "go forward", "turn left" or "eat it". Behavior optimization from the motor control level will be done in the future research.
For simplicity, we used the "vanilla" Deep Q network for the optimization. Recent more advanced optimization algorithms will optimize faster and better than my realization. This experiment is a replication of my previous research of the general homeostatic agent.
https://www.youtube.com/watch?v=_xhMq272wbE
- Agent has NONE, LEFT TURN, RIGHT TURN, FORWARD and EAT behaviors.
- For enhancing the learning speed, I used the technique of the shaping reward for adding the initial bias of the value function.
Windows 10 + Anaconda + Python 3.6+ + Unity ML-Agent 1.0+