Linguistics
I am involved in a collaboration that includes a number of people from SFI, UNM, and elsewhere in which we are developing statistical methods for studying the process of language change. Roughly speaking, our goal is to do in a statistical way what is currently done in an expert-judgment-based way within traditional historical linguistics. That is, given a set of words from a set of related languages, we want to identify cognate words (which derive from a common word in a shared ancestral language), align the sounds in those words (figure out which sounds in word 1 correspond to which sounds in word 2), infer the historical relationship among the languages (possibly a language phylogeny), infer regular and irregular sound changes occuring along specific lines of descent, and estimate relative probabilites of different types of sound change across the language family as a whole. The trick is that we want to do all of this within an internally consistent maximum-likelihood framework.
In a related project, we are developing quantitative methods for characterizing patterns of meaning shift from patterns of polysemy (where one word conveys more than one meaning) in a sample of 80 worldwide languages. For example, if the word for "fire" in a language is replaced by another word, where is that word likely to come from? Is it more likey to come from the word for "anger" or the word for "light"? Can we produce a quantitative estimate of how much more likely based on patterns observed across the world's languages?