Ecological Correlation

An ecological correlation is a correlation based on group means, rather than measurements from individuals. For example, one might study the correlation between body weight and income levels. A study at the individual level might make use of 1000 randomly chosen individuals, then record the current body weight and household income and use those to calculate the correlation between the two variables. By contrast, another study might make use of 100 counties, then measure the mean body weight and the mean household income of each of the 100 counties. A correlation between these group means would be an example of an ecological correlation.

Assuming equivalence between correlations derived from group means and correlations derived from individual data leads to the ecological fallacy.

From Wikipedia:

The term comes from a 1950 paper by Robinson (1950). For each of the 48 states in the US as of the 1930 census, he computed the literacy rate and the proportion of the population born outside the US. He showed that these two figures were associated with a positive correlation of 0.53 — in other words, the greater the proportion of immigrants in a state, the higher its average literacy. However, when individuals are considered, the correlation was −0.11 — immigrants were on average less literate than native citizens. Robinson showed that the positive correlation at the level of state populations was because immigrants tended to settle in states where the native population was more literate. He cautioned against deducing conclusions about individuals on the basis of population-level, or "ecological" data.

Wikipedia also notes that, according to a book by Gelman, Park, Shor, Bafumi, & Corina (2008), in recent elections wealthier states were more likely to vote Democratic and poorer states Republican.  At the individual level, however, wealthier voters are more likely to vote Republican, and poorer voters more likely to vote Democratic.

There are times, however, when an ecological correlation is the proper way to look at the relationship between two variables.  As Lubinski & Humphreys (1996) have advised, "When predicting the behavior or status of groups, correlate means." 


