Predicting the future of the past tense
Mathematicians apply evolutionary models to language
October 15, 2007
Verbs evolve and homogenize at a rate inversely proportional to their prevalence in the English language, according to a formula developed by MIT and Harvard University mathematicians who've invoked evolutionary principles to study our language over the past 1,200 years.
The team, which reported their findings in the Oct. 11 issue of Nature, conceives of linguistic development as an essentially evolutionary scheme. Just as genes and organisms undergo natural selection, words--specifically, irregular verbs that do not take an "-ed" ending in the past tense--are subject to powerful pressure to "regularize" as the language develops.
"Mathematical analysis of this linguistic evolution reveals that irregular verb conjugations behave in an extremely regular way - one that can yield predictions and insights into the future stages of a verb's evolutionary trajectory," says Erez Lieberman, a graduate student in the Harvard-MIT Division of Health Sciences and Technology and in Harvard's School of Engineering and Applied Sciences. "We measured something no one really thought could be measured, and got a striking and beautiful result."
"We're really on the front lines of developing the mathematical tools to study evolutionary dynamics," says Jean-Baptiste Michel, a graduate student at Harvard Medical School. "Before, language was considered too messy and difficult a system for mathematical study, but now we're able to successfully quantify an aspect of how language changes and develops."
Lieberman, Michel, and colleagues built upon previous study of seven competing rules for verb conjugation in Old English, six of which have gradually faded from use over time. They found that the one surviving rule, which adds an "-ed" suffix to simple past and past-participle forms, contributes to the evolutionary decay of irregular English verbs according to a specific mathematical function: It regularizes them at a rate that is inversely proportional to the square root of their usage frequency.
In other words, a verb used 100 times less frequently will evolve 10 times as fast.
To develop this formula, the researchers tracked the status of 177 irregular verbs in Old English through linguistic changes in Middle English and then modern English. Of these 177 verbs that were irregular 1,200 years ago, 145 stayed irregular in Middle English and just 98 remain irregular today, following the regularization over the centuries of such verbs as help, laugh, reach, walk, and work.
The group computed the "half-lives" of the surviving irregular verbs to predict how long they will take to regularize. The most common ones, such as "be" and "think," have such long half-lives (38,800 years and 14,400 years, respectively) that they will effectively never become regular. Irregular verbs with lower frequencies of use--such as "shrive" and "smite," with half-lives of 300 and 700 years, respectively - are much more likely to succumb to regularization.
They project that the next word to regularize will likely be "wed."
"Now may be your last chance to be a 'newly wed'," they quip in the Nature paper. "The married couples of the future can only hope for 'wedded' bliss."
Extant irregular verbs represent the vestiges of long-abandoned rules of conjugation; new verbs entering English, such as "google," are universally regular. Although fewer than 3 percent of modern English verbs are irregular, this number includes the 10 most common verbs: be, have, do, go, say, can, will, see, take, and get. The researchers expect that some 15 of the 98 modern irregular verbs they studied--although likely none of these top 10--will regularize in the next 500 years.
Their Nature paper makes a quantitative, astonishingly precise description of something linguists have suspected for a long time: The most frequently used irregular verbs are repeated so often that they are unlikely to ever go extinct.
"Irregular verbs are fossils that reveal how linguistic rules, and perhaps social rules, are born and die," Michel says.
"If you apply the right mathematical structure to your data, you find that the math also organizes your thinking about the entire process," says Lieberman, whose unorthodox projects as a graduate student have ranged from genomics to bioastronautics. "The data hasn't changed, but suddenly you're able to make powerful predictions about the future."
Lieberman and Michel's co-authors on the Nature paper are from Harvard. The work was sponsored by the John Templeton Foundation, the National Science Foundation, and the National Institutes of Health.
----
It's interesting to me that the selection that occurs here for for simplicity. In nature, natural selection in higher organisms tends towards more complex beings. Those that have extra genes and an efficient way of controlling them do pretty well. The energy expended in copying the genes when cells divide is the only wastage, but when the gene is needed it turns on and saves the day. In simpler organisms, like E. coli, the process of replicating that DNA is the most energy intensive thing the organism will ever do, and there's a tendency towards brevity of genomic information.
Language is one of modern humans' most fundamental, energy intensive endeavors. We talk all of the time, and write and read when we're not talking. We tend towards simplicity of language because it's easier, it takes less work. Txt msg spk makes sense for someone with more information to convey and process than time or intelligence to do so. So as our language evolves it cuts out the extra steps, the extra rules, the extra genes, tending towards homogeneity rather than diversity. In nature, this only works in quickly multiplying, highly mutable life forms- organisms that die quickly in the best of situations and whose offspring may be very different from themselves. If you're a species like a mammal and you're non-diverse then you're extremely vulnerable to shocks.
I'm not sure whether a language can be vulnerable to environmental changes- there's no such thing as a temperature spike or a food shortage in literature. Maybe a simple language is a language that is more quickly taken up by others, and a virus analogy would be better. A successful virus, or parasite of any kind, often doesn't harm its host in an evident way. It piggybacks on the organism's processes, but might not drain enough energy to do real harm. These viruses multiply easily and spread from host to host without burning their homes down. Maybe a successful language is one that infects a host without putting undue demands on its system.
No comments:
Post a Comment