Tuesday, May 12, 2009

Discovering Natural Predicates

This is a project page for a June 2009 project course being offered at the University van Amsterdam.

This project attempts to explore how a system with a capability of modeling "similarities" might discover predicates (or types) through its interactions with the world, leading to models of reasoning and language. The objective is to consider the process by which an infant might be discovering categories, their labels, their semantics, and the ways of combining them, and try to see how these ideas affect our design for similar systems.


Over at least than half a century, despite extended efforts by many bright people, we have failed to arrive at a viable grammar accounting for the syntax of a single human language. For far longer than that, philosophers have been trying to come up with a system of reasoning that models human thought. Thus, both these problems constitute hard problems by the today's standards. At the same time, we must keep in mind that somehow, both these tasks appear to be achieved quite effortlessly by human infants.

This project aims to explore how infants might be achieving this.

One aspect that is fundamental to both tasks is a structuring of the world into a hierarchy of categories, which we shall refer to as predicates; the key aspect is that these categories are somehow perceived to be natural. What constitutes a "natural" category depends on their relevance - how often do our decisions depend on discriminating it? Thus, the infant learning a language quickly learns to distinguish the sounds that make semantic differences in her auditory universe; these constitute the phonemes of a language. Similarly, those categories that provide meaningful inputs to everyday decisions may be thought to constitute "natural predicates". The word "natural" here is in contrast to "conventional", which harks back to a debate originating in Plato's Cratylus. It is also adopts a naive epistemology - i.e. there exist things like categories out there, which we discern with our senses. These are very highly debated topics in philosophy - but we shall explore these computationally.

An early attempt to discover natural predicates can be found in Carnap's Aufbau (1928), which attempts to discover how nature is organized into "constituting systems" using the basic relation of "recollected similarity" or R, where R(x,y) holds if the present stimulus x, is found to be similar to the memory of y. Carnap uses this relation to define what he calls "similarity circles" - the largest class of experiences each of which bears R to every other one. This "autopsychological" or individual-specific model eventually has to be mapped onto the conventional system, including language, followed the group. This transition results in predicates definitions being modified based on the presence of others - e.g. "red" may mean a narrower band in languages that also specify "crimson" and "orange", vs. other languages that may have only four colour words. This point that was noted by de Saussure, but how this transition is to be managed is often ignored in many model theoretic and other definitions of semantics, and remains a significant challenge in computational models.

A modern philosophical view on natural predicates can be found in the work by Quine, (the above is largely based on his "From Stimulus to Science", 1995).

One may re-frame this problem in today's computational terms using the terminology of unsupervised clustering, where the "similarity circles" are now called "clusters", and these are defined as the largest groups which have highest similarity within the class and least similarity across classes. This also addresses the Saussurean point that a word is defined not by itself but in relation to other words defining similar notions. Given a set of such clusters, then, membership in one, which is usually graded (non-binary), may constitute a predicate.

Recent psychological data seems to indicate that infants learn many such natural predicates well before they learn langauge. When language comes around (by the age of nine months or so), they are already aware of many types of objects, but also some abstract e.g. relating to containment (in, out, small, big), path (go to, left, right, towards, come closer, move away, chase, go past), etc. Clearly, many of these structures are learned whereas others (e.g. complex emotional states such as guilt) may be innate. A key aspect of acquiring such predicates appears to be an awareness of others, sometimes called a "Theory of Mind". They also are aware of how some of these predicates "compose" - e.g. that big objects can't go into small holes. They may also be aware of valences - the number of arguments predicates may take and their types - e.g. that "go-to" usually involves one agent and one destination, whereas "chase" needs two agents.

Knowing these natural predicates means that when infants hear linguistic noises, the problem of learning language becomes one of associating these prior states to the sounds that they hear. Further, knowledge of composition may determine how units come together to form phrases, and may underlie the discovery of syntax. Many properties traditionally assumed to be syntactic also follow, e.g. 2-valence verbs are (generally) transitive, 3-valence verbs bitransitive, etc.

However, once language arrives, predicates are increasingly defined in terms of other predicates (almost our entire the adult vocabulary, learned at 10-20 words per day over 15+ years, is learned from discussions and reading). Thus, language then influences these natural predicates, and alters their interpretation considerably.

In this project, I would like to work with you in exploring the feasibility of discovering natural predicates in a simulation world. You will need to construct your own simulations (may be perceptual, may be schematic) of phenomenon in a particular domain (e.g. spatial predicates, actions), and attempt to discover these predicates based on nothing more than the notion of similarity as a metric in some feature space.

Background and tasks

This is essentially a multi-disciplinary area, and there are several possibilities in how you might approach this project.

If you have a good background in mathematics, particularly linear algebra, you can define information-theoretic measures (e.g. coding), that present possibilities for learning the more frequent codes earlier. If you have a background in computer science, you can perhaps try to code up some machine learning techniques to automate the clustering process. If you have a strong grounding in philosophy or mathematical logic, you may wish to study a key problem here - how does one predicate influence another? E.g. the existence of the concept "ellipse" makes the notion of "circle" tighter than it might have been. Alternately, you could study the term "predicate" as it is used by linguists (perhaps involving a graded membership) and see how it can be modeled more formally and lead to a linguistic type theory. People with linguistics interests may look at the developmental questions.

You will need to make some decisions on how you wish to proceed within the first few days. In every case, you should motivate your work based on some animation that you are required to create. You may use paper art, flash animation, or other tools to create the animation.


There will be three early meetings where I will present some of the basics; In following meetings, you will be presenting your readings and defining your approach. In the end, you will present some results which will mostly be theoretical / analyses, though some of you may wish to program up some parts.

Suggested reading:

1. Jean M. Mandler: Foundations of Mind, OUP 2004, Chapter 1: How to build a baby
[Pierson Revesz Bibliotheek 323: 77.31 2004 8]

2. Paul Fletcher and Brian MacWhinney, Handbook of Child Language, Blackwell, 1995, (chapters of your choice)
[Bungehuis 110: ATW IV FLETCH 4]

3. P. Gärdenfors, Conceptual Spaces: The Geometry of Thought, MIT Press, Cambridge, MA, 2000.
[surprisingly, not available in the uva library]

4. Origins of human communication, Michael Tomasello, MIT Press, 2008
[Bushuis Bibliotheek 313: 17.20 132]

Amitabha Mukerjee [mukerjee AT gmail.com]
IIT Kanpur, Department Computer Science
Visiting Professor, ILLC, University van Amsterdam