Reusing thoughts and experience
Substantial work over the last decade has emphasized the distinction between habits and goal‑directed decisions. Theoretically, goal‑directed decisions have been characterized as involving computationally costly model inversion, for instance a tree search. Habits, on the other hand, have been thought to result from experience accumulated in lookup tables, summarizing past experience about state‑action pairs for efficient future re‑deployment without incurring further computational cost. While the former structure allows for fast adaptation, the latter relies on sampling the new consequences of behaviours in the world before it can change. However, theoretical work always suggested a softer distinction in a number of ways. First, efficient game playing algorithms replace subtrees
with lookup table values to cut down on computational costs. Second, direct transfer of internal samples might shape habits without the need to rely on the costly experiential samples from the world. Finally, habits are typically characterised as one‑step state‑action pairs. But there is no strong theoretical reason to limit lookup tables to such simple structures. This opens up the possibility of directly transferring entire solutions to actor‑like habits. Here, we re‑analyse data from an explicitly goal‑directed tree‑search task and find evidence for generalization and re‑use of complex action sequences. We use this to analyse the process by which complex behaviours are stored for future re‑use, either arising through experience or as memorized solutions. This reveals richer habits and less sharp distinction between habits and goal‑directed choices.
Joint work with Anthony Cruickshanck, Neir Eshel, Peter Dayan, Paul Falkner, Sam
Gershman, Niall Lally, Peggy Series and Jon Roiser