Archive for the ‘Statistics’ Category

Be careful of reports showing the “average”

January 28, 2008

The mean, also referred to by statisticians as the average, is the most common statistic used to measure the center, or middle, of a numerical data set. The mean is the sum of all the numbers divided by the total number of numbers. The mean may not be a fair representation of the data, because the average is easily influenced by outliers (very large or very small values in the data set that are not typical).

Example: suppose there is a group of people and we want to calculate the average salary. One person in the group is a billionaire and has a huge salary. Here’s the average:

average = $50K + $45K + $54K + $61K + $10,000K = ($10,210K)/5 = $2,042K

The average salary in the group is $2,042,000! Hardly accurate! The billionaire has distorted the average. For the majority of the group, the average salary is around $50,000.

The median is another way to measure the center of a numerical data set. A statistical median is much like the median of an interstate highway. On a highway, the median is the middle road, and an equal number of lanes lay on either side of the median. In a numerical data set, the median is the point at which there are an equal number of data points whose values lie above and below the median value. Thus, the median is truly the middle of the data set.

Example: in the above example, the median salary is $54K because there are two values below it ($45K and $50K) and two values above it ($61K and $10,000K).

The next time you hear an average reported, look to see whether the median is also reported. The average and the median are two different representations of the middle of a data set and can often give two very different stories about the data.

Statistics for Dummies by Deborah Rumsey

Synergy between Wisdom of Crowds and Statistic’s Sample Size

January 19, 2008

The core idea behind the “wisdom of crowds” is that by aggregating information from a large, diverse group of individuals you can obtain a better solution and make better decisions.

Today I was reading Statistics for Dummies by Deborah Rumsey and realized that the motivation for the wisdom of crowds is quite analogous to the motivation for having a large sample size in statistics, as can be seen in these snippets from the book:

Fewer participants in a study means less information overall, so studies with small numbers of participants in general are less accurate than similar studies with larger sample sizes … Most researchers try to include the largest sample size they can afford, and they balance the cost of the sample size with the need for accuracy … Check the sample size to be sure you have enough information on which to base your results.