Sample Excerpt: Aggregation Bias and Ecological Fallacy

Sample Paper Excerpt

In analyzing population survey data for 1999, two research collaborators disagreed on the method for estimating the correlation between income and education for men age 25 – 54 in the United States. One researcher used the average educational level and average income from each U.S. state, as well as Washington D.C., and then computed the correlation between income and education based upon the 51 state averages. The estimate of the correlation using the average values from the 51 states and D.C. is 0.64.

The colleague felt that by using the correlation for the states to estimate the correlation for individuals, the estimate would be way off. In order to estimate the correlation (association) between income and education for men age 25-54 in US, the colleague wanted to use the values of income and education at the individual level, using Census Dept. samples, instead of using the state-wide averages. The colleague's rationale is that within each state, there will be spread around the averages. Replacing the individual data by their state averages eliminates the spread, and gives misleading impression of association between the two variables.

Which is the more appropriate unit of analysis?

Individual data to estimate correlation between income and education. Averages from each state to estimate correlation between income and education.

