AISB Quarterly, 119:5, 2005.

Entropy and Information in Models of Learning Behaviour

Roman V. Belavkin
School of Computing Science
Middlesex University, London NW4 4BT, UK

Learning is an important process that allows us to reduce the uncertainty of the outcomes of our decisions or in other words the uncertainty about the utilities of decisions. Thus, through learning we can make decisions that are most beneficial to us (or at least that seem to be so). Information Theory has produced convenient apparatus to measure information transfer through a change of entropy (a measure of uncertainty). However, the notion of information cannot be easily applied to studies in experimental psychology, where learning is judged by external observations of subjects' performance in certain tasks. Modern cognitive modelling tools have allowed for bringing information theoretic concepts much closer to cognitive psychology.

One of such tools is the ACT-R cognitive architecture [1], which employs both symbolic and subsymbolic computations. The symbolic production system is used to encode the knowledge of a model, and subsymbolic mechanisms account for some neural-like and probabilistic effects. For example, decisions, which are represented by rules in a model, are selected not only by logical operators (i.e. left-hand side of a rule must be satisfied), but also by the underlying Bayesian learning mechanism that selects rules with higher probabilities of success (or higher utilities). These probabilities can be also used to calculate the entropy of success in the model, and the speed at which this entropy decays is an excellent indicator of the speed of learning [3].

One of the applications of this approach was the study of the effect of motivation and emotion on decision-making strategies and speed of learning [3]. In this study, ACT-R was used to model the classical experiment on animals' learning --- the Yerkes and Dodson `dancing mouse' experiment [5], in which mice were trained in a two-choice task using different levels of reinforcement. The speed of entropy decay in this model was studied under different settings of architectural parameters (see Figure 1). One of such parameters is noise variance corrupting the estimates of utilities of rules in the model. The entropy decay demonstrated that high noise values, which result in a more random and often non-optimal decision-making, facilitate information acquisition. Therefore, using such a `noisy' decision-making strategy can be beneficial when exploration is needed, that is when not much is know about the task yet or when the previous knowledge proves to be ineffective.

Figure 1: Decay of entropy in a model as a result of learning

This result led to the idea of using the entropy to control dynamically noise variance in the model and this way achieve a more dynamic and adaptive behaviour shifting the decision-making from exploration to exploitation strategy [3]. Moreover, models that use such dynamic control fit the data better than models with static noise. This result suggests that animals or humans may too adjust their decision-making strategy according to their estimation of uncertainty of the outcome. Such heuristic enables them to learn and adapt their behaviour faster in dynamic environments.

The subsymbolic learning mechanism of ACT-R employs Bayesian estimation of the expected values of utilities corrupted by noise of some constant variance. However, the experiments with dynamic noise by entropy feedback suggested that higher order statistics of utilities, such as variance, may also play an important role. To test and demonstrate this idea, a new learning algorithm was created for the ACT-R architecture. This algorithm, called OPTIMIST, uses Gamma distribution of time intervals between events and uses estimations of both expected values and variances of utilities in the decision-making process. Although this is an ongoing project, early results demonstrate behaviour similar to that of the model with dynamic noise controlled by entropy of success [4].

Interestingly, dramatic changes of entropy occur when a model succeeds or fails to achieve a certain goal, and these moments coincide with experiences of emotions in subjects, such as joy or frustration [2]. It may well be possible that expression of these emotions are part of or side-effects of some mechanism in the brain responsible for estimation of uncertainty and adaptation of behaviour.


  1. J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Qin. An integrated theory of the mind. Psychological Review, 111(4):1036-1060, 2004.
  2. R. V. Belavkin. The role of emotion in problem solving. In C. Johnson, editor, Proceedings of the AISB'01 Symposium on Emotion, Cognition and Affective Computing, pages 49-57, Heslington, York, England, March 2001.
  3. R. V. Belavkin and F. E. Ritter. The use of entropy for analysis and control of cognitive models. In F. Detje, D. Dorner, and H. Schaub, editors, Proceedings of the Fifth International Conference on Cognitive Modelling, pages 21-26, Universitats-Verlag Bamberg, Germany, April 2003.
  4. R. V. Belavkin and F. E. Ritter. OPTIMIST: A new conflict resolution algorithm for ACT-R. In Proceedings of the Sixth International Conference on Cognitive Modelling, pages 40-45, Lawrence Erlbaum, Mahwah, NJ, 2004.
  5. R. M. Yerkes and J. D. Dodson. The relation of strength of stimulus to rapidity of habit formation. Journal of Comparative Neurology and Psychology, 18:459-482, 1908.
Author: Roman Belavkin
Last Modified: 01/31/2006 10:13:16