Lecture Theory: Statistics

Central Tendency

The term central tendency refers to the "middle" value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best to use depends upon the situation.

Mean

The mean is the most commonly-used measure of central tendency. When we talk about an "average", we usually are referring to the mean. The mean is simply the sum of the values divided by the total number of items in the set. The result is referred to as the arithmetic mean. Sometimes it is useful to give more weighting to certain data points, in which case the result is called the weighted arithmetic mean.

The notation used to express the mean depends on whether we are talking about the population mean or the sample mean:

The population mean then is defined as:

The mean is valid only for interval data or ratio data. Since it uses the values of all of the data points in the population or sample, the mean is influenced by outliers that may be at the extremes of the data set.

Median

The median is determined by sorting the data set from lowest to highest values and taking the data point in the middle of the sequence. There is an equal number of points above and below the median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points greater than this value and two data points less than this value. In this case, the median is equal to the mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is three, but the mean is equal to 4. If there is an even number of data points in the set, then there is no single point at the middle and the median is calculated by taking the mean of the two middle points.

The median can be determined for ordinal data as well as interval and ratio data. Unlike the mean, the median is not influenced by outliers at the extremes of the data set. For this reason, the median often is used when there are a few extreme values that could greatly influence the mean and distort what might be considered typical. This often is the case with home prices and with income data for a group of people, which often is very skewed. For such data, the median often is reported instead of the mean. For example, in a group of people, if the salary of one person is 10 times the mean, the mean salary of the group will be higher because of the unusually large salary. In this case, the median may better represent the typical salary level of the group.

Mode

The mode is the most frequently occurring value in the data set. For example, in the data set {1,2,3,4,4}, the mode is equal to 4. A data set can have more than a single mode, in which case it is multimodal. In the data set {1,1,2,3,3} there are two modes: 1 and 3.

The mode can be very useful for dealing with categorical data. For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the most popular sandwich. The mode also can be used with ordinal, interval, and ratio data. However, in interval and ratio scales, the data may be spread thinly with no data points having the same value. In such cases, the mode may not exist or may not be very meaningful.

When to use Mean, Median, and Mode

The following table summarizes the appropriate methods of determining the middle or typical value of a data set based on the measurement scale of the data.

Measurement Scale	Best Measure of the "Middle"
Nominal (Categorical)	Mode
Ordinal	Median
Interval	Symmetrical data: Mean Skewed data: Median
Ratio	Symmetrical data: Mean Skewed data: Median

Dispersion

Without knowing something about how data is dispersed, measures of central tendency may be misleading. For example, a residential street with 20 homes on it having a mean value of $200,000 with little variation from the mean would be very different from a street with the same mean home value but with 3 homes having a value of $1 million and the other 17 clustered around $60,000. Measures of dispersion provide a more complete picture. Dispersion measures include the range, average deviation, variance, and standard deviation.

Range

The simplest measure of dispersion is the range. The range is calculated by simply taking the difference between the maximum and minimum values in the data set. However, the range only provides information about the maximum and minimum values and does not say anything about the values in between.

Average Deviation

Another method is to calculate the average difference between each data point and the mean value, and divide by the number of points to calcuate the average deviation (mean deviation). However, performing this calcuation will result in an average deviation of zero since the values above the mean will cancel the values below the mean. If this method is used, the absolute value of the difference is taken so that only positive values are obtained, and the result sometimes is called the mean absolute deviation. The average deviation is not very difficult to calculate, and it is intuitively appealing. However, the mathematics are very complex when using it in subsequent statistical analysis. Because of this complexity, the average deviation is not a very commonly used measure of dispersion.

Variance and Standard Deviation

A better way to measure dispersion is to square the differences before averaging them. This measure of dispersion is known as the variance, and the square root of the variance is known as the standard deviation. The standard deviation and variance are widely used measures of dispersion.

Standard Deviation and Variance

A commonly used measure of dispersion is the standard deviation, which is simply the square root of the variance. The variance of a data set is calculated by taking the arithmetic mean of the squared differences between each value and the mean value. Squaring the difference has at least three advantages:

1. Squaring makes each term positive so that values above the mean do not cancel values below the mean.

2. Squaring adds more weighting to the larger differences, and in many cases this extra weighting is appropriate since points further from the mean may be more significant.

3. The mathematics are relatively manageable when using this measure in subsequent statisitical calculations.

Because the differences are squared, the units of variance are not the same as the units of the data. Therefore, the standard deviation is reported as the square root of the variance and the units then correspond to those of the data set.

The calculation and notation of the variance and standard deviation depends on whether we are considering the entire population or a sample set. Following the general convention of using Greek characters to express population parameters and Arabic characters to express sample statistics, the notation for standard deviation and variance is as follows:

The population variance is defined as:

The population standard deviation is the square root of this value.

The variance of a sampled subset of observations is calculated in a similar manner, using the appropriate notation for sample mean and number of observations. However, while the sample mean is an unbiased estimator of the population mean, the same is not true for the sample variance if it is calculated in the same manner as the population variance. If one took all possible samples of n members and calculated the sample variance of each combination using n in the denominator and averaged the results, the value would not be equal to the true value of the population variance; that is, it would be biased. This bias can be corrected by using ( n - 1 ) in the denominator instead of just n, in which case the sample variance becomes an unbiased estimator of the population variance.

This corrected sample variance is defined as:

The sample standard deviation is the square root of this value.

Standard deviation and variance are commonly used measures of dispersion. Additional measures include the range and average deviation.

Probability

Three Different Concepts of Probability

The classical interpretation of probability is a theoretical probability based on the physics of the experiment, but does not require the experiment to be performed. For example, we know that the probability of a balanced coin turning up heads is equal to 0.5 without ever performing trials of the experiment. Under the classical interpretation, the probability of an event is defined as the ratio of the number of outcomes favorable to the event divided by the total number of possible outcomes.

Sometimes a situation may be too complex to understand the physical nature of it well enough to calculate probabilities. However, by running a large number of trials and observing the outcomes, we can estimate the probability. This is the empirical probability based on long-run relative frequencies and is defined as the ratio of the number of observed outcomes favorable to the event divided by the total number of observed outcomes. The larger the number of trials, the more accurate the estimate of probability. If the system can be modeled by computer, then simulations can be performed in place of physical trials.

A manager frequently faces situations in which neither classical nor empirical probabilities are useful. For example, in a one-shot situation such as the launch of a unique product, the probability of success can neither be calculated nor estimated from repeated trials. However, the manager may make an educated guess of the probability. This subjective probability can be thought of as a person's degree of confidence that the event will occur. In absence of better information upon which to rely, subjective probability may be used to make logically consistent decisions, but the quality of those decisions depends on the accuracy of the subjective estimate.

Outcomes and Events

An event is a subset of all of the possible outcomes of an experiment. For example, if an experiment consists of flipping a coin two times, the possible outcomes are:

heads, heads
heads, tails
tails, heads
tails, tails

One can define the showing of heads at least one time to be an event, and this event would consist of three of the four possible outcomes.

Given that the probability of each outcome is known, the probability of an event can be determined by summing the probabilities of the individual outcomes associated with the event.

A composite event is an event defined by the union or intersection of two events. The union of two events is expressed by the "or" function. For example, the probability that either Event A or Event B (or both) will occur is expressed by P(A or B). The intersection of two events is the probability that both events will occur and is expressed by the "and" function. For exampe, the probability that both Event A and Event B will occur is expressed by P(A and B).

Law of Addition

Consider the following Venn diagram in which each of the 25 dots represents an outcome and each of the two circles represents an event.

In the above diagram, Event A is considered to have occurred if an experiment's outcome, represented by one of the dots, falls within the bounds of the left circle. Similarly, Event B is considered to have occurred if an experiment's outcome falls within the bounds of the right circle. If the outcome falls within the overlapping region of the two circles, then both Event A and Event B are considered to have occurred.

There are 5 outcomes that fall in the definition of Event A and 6 outcomes that fall in the definition of Event B. Assuming that each outcome represented by a dot occurs with equal probability, the probability if Event A is 5/25 or 1/5, and the probability of Event B is 6/25. The probability of Event A or Event B would be the total number of outcomes in the orange area divided by the total number of possible outcomes. The probability of Event A or Event B then is 9/25.

Note that this result is not simply the sum of the probabilities of each event, which would be equal to 11/25. Since there are two outcomes in the overlapping area, these outcomes are counted twice if we simply sum the probabilities of the two events. To prevent this double counting of the outcomes common to both events, we need to subtract the probability of those two outcomes so that they are counted only once. The result is the law of addition, which states that the probability of Event A or Event B (or both) occurring is given by:

P(A or B) = P(A) + P(B) - P(A and B)

This addition rule is useful for determining the probability that at least one event will occur. Note that for mutually exclusive events there is no overlap of the two events so:

P(A and B) = 0

and the law of addition reduces to:

P(A or B) = P(A) + P(B)

Conditional Probability

Sometimes it is useful to know the probability that an event will occur given that another event occurred. Given two possible events, if we know that one event occurred we can apply this information in calculating the other event's probability. Consider the Venn diagram of the previous section with the two overlapping circles. If we know that Event B occurred, then the effective sample space is reduced to those outcomes associated with Event B, and the Venn diagram can be simplified as shown:

The probability that Event A also has occurred is the probability of Events A and B relative to the probability of Event B. Assuming equal probability outcomes, given two outcomes in the overlapping area and six outcomes in B, the probability that Event A occurred would be 2/6. More generally,

P(A given B) =

P(A and B)
----------------------------
P(B)

Law of Multiplication

The probability of both events occurring can be calculated by rearranging the terms in the expression of conditional probability. Solving for P(A and B), we get:

P(A and B) = P(A given B) x P(B)

For independent events, the probability of Event A is not affected by the occurance of Event B, so P(A given B) = P(A), and

P(A and B) = P(A) x P(B)

Permutations and Combinations

Certain types of probability calculations involve dividing the number of outcomes associated with an event by the total number of possible outcomes. For simple problems it is easy to count the outcomes, but in more complex situations manual counting can become laborious or impossible.

Fortunately, there are formulas for determining the number of ways in which members of a set can be arranged. Such arrangements are referred to as permutations or combinations, depending on whether the order in which the members are arranged is a distinguishing factor.

The number of different orders in which members of a group can be arranged for a group of r members taken r at a time is:

(r)(r-1)(r-2)...(1)

This is more easily expressed as simply r!.

When order is a distinguishing factor, a group of n members taken r at a time results in a number of permutations equal to the first r terms of the following multiplication:

(n)(n-1)(n-2)...

This can be expressed as:

_nP_r = n! / (n - r)!

In combinations, order is not a distinguishing factor:

_nC_r = _nP_r / (r!) = n! / (n - r)!r!

For the special case of possible pairs in a group of n members, assuming order in a pair is not important, then:

r = 2

and the number of possible pairs is:

n(n - 1) / 2.

Example: How many two-element subsets of {1,2,3,4} are there that do not contain the pair of elements 2 and 4 ?

Solution: 4! / (2!)(2!) = 6, but the subset {2,4} is not to be counted, so the answer is 5.

Given n items taken r at a time, to find the number of combinations in which x particular items are not present, simply reduce n by x and solve as one would a normal combination problem.

Combinations of Groups

If Group A has x members, Group B has y members, and Group C has z members, there are (x)(y)(z) possible combinations assuming that one member from each of the three groups is used in each combination, and assuming that the order is not a distinguishing factor. In general, if more than one member is taken at a time from each group, the number of combinations is the product of _nC_r (or _nP_r if appropriate) associated with each particular group.

Lecture Theory

Statistics