I will focus on the way of thinking when applying Bayesian models to study cognition. To do so, the mathematical nature of Bayesian methods is instead put in Math: Probability. Important phrases are in italic, while terms are be in bold.
The Book (open access): https://mitpress.ublish.com/ebook/bayesian-models-of-cognition-reverse-engineering-the-mind-preview/12799/Cover
Junsong Lu’s note: https://docs.google.com/document/d/1JQ-XgvHgS6VwTIS9cYRX2-eZBkiocZTYXk7ohwOVXbo/edit?usp=sharing
Ch1 Overview
1.1 Generalization and Induction
- “Some more abstract background knowledge must generate and delimit the hypotheses that learners consider, or meaningful generalization would be impossible … [called] constraints … inductive bias … priors …” (p. 7, talking about the problem of induction).
- Comment: the “curse of compositional minds” is related. If a mind has access to a full version of language of mind, there are exponentially many hypotheses it can entertain. How can it (the mind) possibly entertain them all and pick the right one. From Elizabeth Spelke ‘s https://youtu.be/UiINpPTrtzE?t=703 [An argument by lack of imagination]
- Just like the “neural network” paradigm, the “Bayesian cognitive modeling” paradigm is a “body of mutually reinforcing and supporting concepts, principles, and tools … Bayes provides a starting point, but only the start.” (p. 8, clarifying the paradigm)
- The book’s focus will be on only one kind of abstract knowledge—“probabilistic generative world models”, which are the minds’ models of the unobserved variable states and causal processes at work in the world, with uncertainty. (p. 8, clarifying the book’s focus)
1.2 From One Question to Many
- One question: “how our minds get so much from so little” (p. 10)
- Many questions (p. 10)
- How does abstract knowledge of the world guide learning and inference from sparse data?
- Probabilistic generative world models describe a broader class of situations beyond the specific situation at hand, over which learning should generalize, in parsimonious form. (p. 13)
- What forms does our abstract world knowledge take across different domains and tasks?
- Historically defining probabilities over more structured (than large numerical vectors) symbolic forms of knowledge representations, such as graphs (trees, chains, rings… see ch4), grammars (see ch16), predicate logic (see ch17), relational schemas, and functional programs. (p. 15) “Each makes different kinds of prior distributions natural … therefore imposes different constraints on induction.” (p. 17)
- Often hierarchical in the sense that simultaneously learning a framework theory (that specify the structures of downstream structured representation, e.g., a rule of how causal network should grow). (p. 16)
- Often need to specify the hypothesis space at the highest level. This yields the question of where does this come from.
- “It is no coincidence, then, that our best accounts of people’s mental representations often resemble simpler versions of how scientists represent the same domains.” (p. 18, talking about parsimony trade-off).
- How are (abstract) world models themselves acquired or constructed?
- Top-down route to the origins of knowledge: Hierarchical Bayesian Models (HBMs). Noah D. Goodman ‘s Bayesian “blessing of abstraction”. His idea is that even when learning purely from experiences, it’s possible to learn abstract ideas earlier, which will guide the subsequent learning of details. The reason that we can learn fast is that we learn abstract things first, and it’s possible because much more data (higher frequency) can be used to learn an abstract thing than to learn a very specific thing. (p. 23)
- To be flexible: Nonparametric Bayesian Models (or infinite models, such as models using a Chinese restaurant process (CRP) to be flexible in the number of variable classes, or models with Indian buffet process (IBP) to be flexible in dynamic growth of new perceptual features), posing hypotheses with unbounded amount of structure but only finitely many degrees of freedom that are actively engaged for a given data set. (p. 21)
- How do we use our world models to make decisions and act in the world successfully?
- Sequential decision programs described as Markov decision processes (MDPs) and Partially observable Markov decision processes (POMDPs)
- Reinforcement learning problem: how to learn how actions modify states and produce costs and rewards.
- How can learning and inference with complex world models be implemented efficiently in minds with bounded computational resources?
- Historically surrounding “Roger Shepard’s idea that we should be able to identify universal laws of cognition by thinking about the ideal solutions to the abstract problems that all intelligent agents need to solve.” But how is the solution implemented? (p.27)
- Amortized inference system: substitute probability distributions that are easier to work with (e.g., neural nets) such that a system pays high cost up-front but the cost of inference for new observations is lowered. See ch12.
- Ch13 talks about “how well we can explain human behavior in terms of the rational use of cognitive resources”. (p. 28)
- How are complex world models implemented in a physical machine, brain, or computer?
- What are the origins of our world models in evolution and development—what is built into a baby’s mind, and how do children learn within and beyond that starting point?
- What is needed to scale up learning to all the knowledge that a human being acquires over their lifetime and human cultures have built over generations?
- How do we learn from “data generated by people”. (p. 31)
- “One distinctive aspect of Bayesian models of cognition, compared with other learning paradigms such as artificial neural networks, is that they require precisely specifying assumptions about how the data are generated.” This motivates a body of literature on the principles of pedagogy and communication from the perspective of what people assume about how the data are sampled by other agents in different social context. (p. 31) From a philosophy of science standpoint, this burden of specifying data generation hypotheses on Bayesian thinkers is good for scientific advancement, precisely because theories are now more subject to empirical investigations.
- “Many developmental researchers rejected [the choice between nativism and empiricism] altogether and pursued less formal approaches to describing the growing minds of children, under the headings of ‘constructivism’ or the ‘theory theory.’” (p. 13)
- Bayesian paradigm may not be alone a falsifiable theory. (p. 34) It is better treated as a methodological framework that suggests ways to generate falsifiable theories: if a Bayesian model captures human data well, there can be many falsifiable theories that also specify a specific approximation algorithm of the Bayesian model, potentially with radically different representations and data structures.
Ch2 History
- Around the eighties, “the philosophy of cognitive science began to shift from the core idea that cognition is symbol manipulation to the more specific notion that cognition is mechanized logical inference over a logical language of thought […].” (p. 43)
- One limitation of (then) symbolism is the lack of probability. In the context of parsing a sentence into structured representation, symbolism had a hard time explaining “how the probabilities of each local parse are modified by the probabilities of the others as the puzzle is pieced together.” (p. 45)
- The application of Bayesianism to computer vision “can be seen as continuous with Helmholtz’s statement of the likelihood principle—namely, that the perceptual system seeks the most probable interpretation given its input data.”
- Comment: Another school of thought posits that “perceptions reflect biological utility based on past experience rather than objective features of the environment.” (Building on James Gibson’s ecological analysis of vision)
- Another way to articulate Marr’s computational level: “describing what counts as a solution to that problem”. (p. 56)
Ch3 Bayesian Inference
- Subjective probability
- “Being a Bayesian requires a leap of faith. You need to be willing to assume that … people’s degrees of belief can be represented by probabilities.”
- Arguments for it: Cox’s Theorem / Inevitability argment, Dutch book argument.