1-1 GOALS What is Statistics? When you have completed this Part, you will be able ONE to: Understand why we study statistics. TWO Explain what is meant by descriptive statistics and inferential statistics. THREE Distinguish between a qualitative variable and a quantitative variable. FOUR Distinguish between a discrete variable and a continuous variable. FIVE Distinguish among the nominal, ordinal, interval, and ratio levels of measurement. SIX Define the terms mutually exclusive and exhaustive.

Goals 1-2 Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to assist in making more effective decisions. What is Meant by Statistics?

1-3 Statistical techniques are used extensively by marketing, accounting, quality control, consumers, professional sports people, hospital administrators, educators, politicians, physicians, and many others. Who Uses Statistics? 1-4 Descriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way. EXAMPLE 1: A Gallup poll found that 49% of the people in a survey knew the name

of the first book of the Bible. The statistic 49 describes the number out of every 100 persons who knew the answer. EXAMPLE 2: According to Consumer Reports, General Electric washing machine owners reported 9 problems per 100 machines during 2001. The statistic 9 describes the number of problems out of every 100 machines. Types of Statistics 1-5 Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a sample.

A Population is a Collection of all possible individuals, objects, or measurements of interest. A Sample is a portion, or part, of the population of interest Types of Statistics 1-6 Example 1: TV networks constantly monitor the popularity of their programs by hiring Nielsen and other

organizations to sample the preferences of TV viewers. #1 Example 2: Wine tasters sip a few drops of wine to make a decision with respect to all the wine waiting to be released for sale. Example 3: The accounting department of a large firm will select a sample of the invoices to check for accuracy for all the invoices of the company. Types of Statistics (examples of inferential statistics) 1-7

For a Qualitative or Attribute Variable the characteristic being studied is nonnumeric. G en d er T yp e of ca r E ye C o lo r S tate of B irth Types of Variables 1-8 In a Quantitative Variable information is reported numerically. Balance in your checking account Minutes remaining in class Number of children in a family Types of Variables

1-9 Quantitative variables can be classified as either Discrete or Continuous. Discrete Variables: can only assume certain values and there are usually gaps between values. Example: the number of bedrooms in a house, or the number of hammers sold at the local Home Depot (1,2,3,,etc). Types of Variables 1-10 A Continuous Variable can assume any value within a specified range. The pressure in a tire

The weight of a pork chop The height of students in a class. Types of Variables 1-11 DATA Q u a li t a t iv e o r a t t r i b u t e (ty p e o f c a r o w n e d ) Q u a n t i t a t iv e o r n u m e r ic a l d is c r e t e ( n u m b e r o f c h ild r e n ) c o n t in u o u s ( t im e t a k e n f o r a n e x a m ) Summary of Types of Variables 1-12

There are four levels of data Nominal Ordinal Interval Ratio Levels of Measurement 1-13 Nominal level Data that is classified into categories and cannot be arranged in any particular order. G en d er E ye

C o lo r Nominal data 1-14 Nominal level variables must be: Mutually exclusive An individual, object, or measurement is included in only one category. Exhaustive Each individual, object, or measurement must appear in one of the categories. Levels of Measurement 1-15 Ordinal level: involves data arranged in some order, but the differences between data values cannot

be determined or are meaningless. During a taste test of 4 soft drinks, Coca Cola was ranked number 1, Dr. Pepper number 2, Pepsi number 3, and Root Beer number 4. 4 2 1 3 Levels of Measurement 1-16 Interval level Similar to the ordinal level, with the additional

property that meaningful amounts of differences between data values can be determined. There is no natural zero point. Temperature on the Fahrenheit scale. Levels of Measurement 1-17 Ratio level: the interval level with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement. M iles t ra v eled b y sa les rep resen t a t iv e in a m o n t h M o n th ly in co m e o f su rg eo n s Levels of Measurement

1-18 Describing Data: Frequency Distributions and Graphic Presentation GOALS When you have completed this Part, you will be able to: ONE Organize data into a frequency distribution. TWO Portray a frequency distribution in a histogram, frequency polygon, and cumulative frequency polygon. THREE Present data using such graphic techniques as line charts, bar charts, and pie charts. Goals 1-19 A Frequency Distribution is a grouping of data into mutually exclusive

categories showing the number of observations in each class. Frequency Distribution 1-20 Constructing a frequency distribution involves: Determining the question to be addressed Constructing a frequency distribution 1-21 Constructing a frequency distribution involves: Collecting raw data Determining the question to be addressed Constructing a frequency distribution 1-22 Constructing a frequency distribution involves:

Organizing data (frequency distribution) Collecting raw data Determining the question to be addressed Constructing a frequency distribution 1-23 Constructing a frequency distribution involves: Presenting data (graph) Organizing data (frequency distribution) Collecting raw data Determining the question to be addressed Constructing a frequency distribution 1-24 Constructing a frequency distribution involves: Drawing conclusions Presenting data (graph) Organizing data (frequency distribution) Collecting raw data

Determining the question to be addressed Constructing a frequency distribution 1-25 20 Drawing conclusions Presenting data (graph) 15 Organizing data (frequency distribution) 10 5 Collecting raw data 1.5 3.5 5.5

7.5 9.5 11.5 13.5 Constructing a frequency distribution 1-26 Class Midpoint: A point that divides a class into two equal parts. This is the average of the upper and lower class limits. Class interval: The Class Frequency: The number of observations in each class. class interval is

obtained by subtracting the lower limit of a class from the lower limit of the next class. The class intervals should be equal. Definitions 1-27 Dr. Tillman is Dean of the School of Business Socastee University. He wishes prepare to a report showing the number of hours per week students spend studying. He selects a random sample of 30 students and determines the number of hours each student studied last week. 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6. Organize the data into a frequency distribution. EXAMPLE 1

1-28 Step One: Decide on the number of classes using the formula 2k > n where k=number of classes n=number of observations oThere are 30 observations so n=30. oTwo raised to the fifth power is 32. oTherefore, we should have at least 5 classes, i.e., k=5. Example 1 continued 1-29 Step Two: Two Determine the class interval or width using the formula i > H L = 33.8 10.3 = 4.7 5 k where H=highest value, L=lowest value

Round up for an interval of 5 hours. Set the lower limit of the first class at 7.5 hours, giving a total of 6 classes. Example 1 continued 1-30 Step Three: Three Set the individual class limits and Steps Four and Five: Five Tally and count the number of items in each class. Hours studying Frequency, f 7.5 up to 12.5 1 12.5 up to 17.5 12

17.5 up to 22.5 10 22.5 up to 27.5 5 27.5 up to 32.5 1 32.5 up to 37.5 1 EXAMPLE 1 continued 1-31 Class Midpoint: find the midpoint of each interval, use the following formula:

Upper limit + lower limit 2 Hours studying 7.5 up to 12.5 Midpoint f (12.5+7.5)/2 =10.0 1 12.5 up to 17.5 (17.5+12.5)/2=15.0 12 17.5 up to 22.5 (22.5+17.5)/2=20.0

10 22.5 up to 27.5 (27.5+22.5)/2=25.0 5 27.5 up to 32.5 (32.5+27.5)/2=30.0 1 32.5 up to 37.5 (37.5+32.5)/2=35.0 1 Example 1 continued

1-32 A Relative Frequency Distribution shows the percent of observations in each class. Hours f Relative Frequency 7.5 up to 12.5 1 1/30=.0333 12.5 up to 17.5 12 12/30=.400

17.5 up to 22.5 22.5 up to 27.5 10 5 10/30=.333 5/30=.1667 27.5 up to 32.5 32.5 up to 37.5 1 1 1/30=.0333 1/30=.0333 TOTAL 30 30/30=1

Example 1 continued 1-33 The three commonly used graphic forms are Histograms, Frequency Polygons, and a Cumulative Frequency distribution. A Histogram is a graph in which the class midpoints or limits are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. Graphic Presentation of a Frequency Distribution 1-34 14 Frequency 12 10

8 6 4 2 0 10 15 20 25 30 35 Hours spent studying midpoint Histogram for Hours Spent Studying

1-35 Graphic Presentation of a Frequency Distribution A Frequency Polygon consists of line segments connecting the points formed by the class midpoint and the class frequency. Graphic Presentation of a Frequency Distribution 1-36 Frequency Polygon for Hours Spent Studying 14 Frequency 12 10

8 6 4 2 0 10 15 20 25 30 35 Hours spent studying Frequency Polygon for Hours Spent Studying 1-37

Cumulative Frequency Distribution A Cumulative Frequency Distribution is used to determine how many or what proportion of the data values are below or above a certain value. To create a cumulative frequency polygon, scale the upper limit of each class along the X-axis and the corresponding cumulative frequencies along theCumulative

Y-axis.Frequency distribution 1-38 Cumulative Frequency Table for Hours Spent Studying Hours Studying Upper Limit f Cumulative Frequency 7.5 up to 12.5 12.5 1

1 12.5 up to 17.5 17.5 12 13 (1+12) 17.5 up to 22.5 22.5 10 23 (13+10) 22.5 up to 27.5 27.5 5

28 (23+5) 27.5 up to 32.5 32.5 1 29 (28+1) 32.5 up to 37.5 37.5 1 30 (29+1) Cumulative frequency table 1-39 Cumulative Frequency Distribution

For Hours Studying 35 30 25 Frequency 20 15 10 5 0 10 15 20 25 30 35 Hours Spent Studying

Cumulative frequency distribution 1-40 Line graphs are typically used to show the change or trend in a variable over time. Year 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Males 30.5 30.8 31.1

31.4 31.6 31.9 32.2 32.5 32.8 33.2 33.5 Females 32.9 33.2 33.5 33.8 34.0 34.3 34.6 34.9 35.2 35.5 35.8 Line Graphs

1-41 Median Age U.S. median age by gender 40 Males Females 35 30 25 Example 3 continued 1-42 A Bar Chart can be used to depict any of the levels of measurement (nominal, ordinal, interval, or ratio). Construct a bar chart for the number of unemployed per

100,000 population for selected cities during 2001 Number of unemployed City per 100,000 population Atlanta, GA Boston, MA Chicago, IL Los Angeles, CA New York, NY Washington, D.C. 7300 5400 6700 8900 8200 8900 Bar Chart # unemployed/100,000 1-43

10000 8900 8900 9000 8200 8000 7300 6700 7000 5400 6000 5000 4000 3000 2000 1000 0 1 2 3 4 5 6

Atlanta Boston Chicago Los Angeles New York Washington Cities Bar Chart for the Unemployment Data 1-44 A Pie Chart is useful for displaying a relative frequency distribution. A circle is divided proportionally to the relative frequency and portions of the circle are allocated for the different groups. A sample of 200 runners were asked to indicate their favorite type of running shoe. Draw a pie chart based on the following information. Type of shoe

# of runners % of total Nike 92 46.0 Adidas 49 24.5 Reebok 37 18.5 Asics

13 6.5 Other 9 4.5 Pie Chart 1-45 Pie Chart for Running Shoes 18,50% 6,50% 4,50% Nike Adidas Reebok Asics

Other 24,50% 46% Pie Chart for Running Shoes 1-46 3- 46 Describing Data: Numerical Measures GOALS When you have completed this Part, you will ONEbe able to: Calculate the arithmetic mean, median, mode, weighted mean, and the geometric mean. TWO Explain the characteristics, uses, advantages, and disadvantages of each measure of location. THREE Identify the position of the arithmetic mean, median,

and mode for both a symmetrical and a skewed distribution. Goals 1-47 3- 47 Describing Data: Numerical Measures FOUR Compute and interpret the range, the mean deviation, the variance, and the standard deviation of ungrouped data. FIVE Explain the characteristics, uses, advantages, and disadvantages of each measure of dispersion. SIX Understand Chebyshevs theorem and the Empirical Rule as they relate to a set of observations. Goals 1-48 3- 48 The Arithmetic Mean is

the most widely used measure of location and shows the central value of the data. It is calculated by summing the values and dividing by the number of values. The major characteristics of the mean are: A verage Joe It requires the interval scale. All values are used. It is unique. The sum of the deviations from the mean is 0. Characteristics of the Mean

For ungrouped data, the Population Mean is the sum of all the population values divided by the total number of population values: 1-49 3- 49 X N where is the population mean N is the total number of observations. X is a particular value. indicates the operation of adding. Population Mean 1-50

3- 50 A Parameter is a measurable characteristic of a population. The Kiers family owns four cars. The following is the current mileage on each of the four cars. X N 56,000 42,000 23,000 73,000 Find the mean mileage for the cars.

56,000 ... 73,000 48,500 4 Example 1 1-51 3- 51 For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values: X X n where n is the total number of values in the sample. Sample Mean 1-52 3- 52

A statistic is a measurable characteristic of a sample. A sample of five executives received the following bonus last year ($000): 14.0, 15.0, 17.0, 16.0, 15.0 X 14.0 ... 15.0 77 X 15.4 n 5 5

Example 2 Properties of the Arithmetic Mean Every 1-53 3- 53 set of interval-level and ratio-level data has a mean. All A the values are included in computing the mean. set of data has a unique mean. The mean is affected by unusually large or small data values. The

arithmetic mean is the only measure of location where the sum of the deviations of each value from the mean is zero. Properties of the Arithmetic Mean 1-54 3- 54 Consider the set of values: 3, 8, and 4. The mean is 5. Illustrating the fifth property ( X X ) (3 5) (8 5) (4 5) 0 Example 3 1-55 3- 55 The Weighted Mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1,

w2, ...,wn, is computed from the following formula: ( w1 X 1 w2 X 2 ... wn X n ) Xw ( w1 w2 ...wn ) Weighted Mean 1-56 3- 56 During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of the price of the drinks. 5($0.50) 15($0.75) 15($0.90) 15($1.15) Xw 5 15 15 15

$44.50 $0.89 50 Example 4 1-57 3- 57 The Median is the midpoint of the values after they have been ordered from the smallest to the largest. There are as many values above the median as below it in the data array. For an even set of values, the median will be the arithmetic average of the two middle numbers and is

found at the (n+1)/2 ranked observation. The Median 1-58 3- 58 The ages for a sample of five college students are: 21, 25, 19, 20, 22. Arranging the data in ascending order gives: 19, 20, 21, 22, 25. Thus the median is 21. The median (continued) 1-59 3- 59 The heights of four basketball players, in inches, are: 76, 73, 80, 75. Arranging the data in

ascending order gives: 73, 75, 76, 80 Thus the median is 75.5. The median is found at the (n+1)/2 = (4+1)/2 =2.5th data point. Example 5 1-60 3- 60 Properties of the Median There is a unique median for each data set. It is not affected by extremely large or small values and is therefore a valuable measure of

location when such values occur. It can be computed for ratio-level, intervallevel, and ordinal-level data. It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class. Properties of the Median 1-61 3- 61 The Mode is another measure of location and represents the value of the observation that appears most frequently. Example 6: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode. Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes,

trimodal, and the like. The Mode: Example 6 Symmetric distribution: 1-62 362 A distribution having the same shape on either side of the center Skewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution. Can be positively or negatively skewed, or bimodal The Relative Positions of the Mean, Median, and Mode 4-63 Skewness is the

measurement of the lack of symmetry of the distribution. The coefficient of skewness can range from -3.00 up to 3.00 when using the following formula: sk 3 X Median s A value of 0 indicates a symmetric distribution. Some software packages use a

different formula which results in a wider range for the coefficient. 1-64 3- 64 Zero skewness Mean =Median =Mode M ean M e d ia n M ode The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution Positively skewed: Mean and median are to the right of the

1-65 3- 65 mode. Mean>Median>Mode M ode M ean M e d ia n The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution 1-66 3- 66 Negatively Skewed: Mean and Median are to the left of the Mode. Mean

M ean M ode M e d ia n The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution 1-67 3- 67 The Geometric Mean (GM) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula is: GM n ( X 1)( X 2 )( X 3)...( Xn ) The geometric mean is used to average percents, indexes, and relatives. Geometric Mean

1-68 3- 68 The interest rate on three bonds were 5, 21, and 4 percent. The arithmetic mean is (5+21+4)/3 =10.0. The geometric mean is GM 3 (5)(21)(4) 7.49 The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent. Example 7 1-69 3- 69 GM n Grow th in Sales 1999-2004

50 Sales in Millions($) Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another. 40 30 20 10 0 1999 2000 2001

2002 2003 2004 Year ( Value at end of period) 1 (Value at beginning of period) Geometric Mean continued 1-70 3- 70 The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%. 835,000

GM 8 1 .0127 755,000 Example 8 1-71 3- 71 Variance: the arithmetic mean of the squared deviations from the mean. Standard deviation: deviation The square root of the variance.

Variance and standard Deviation 1-72 3- 72 The major characteristics of the Population Variance are: Not influenced by extreme values. The units are awkward, the square of the original units. All values are used in the calculation. Population Variance 1-73 3- 73 Population Variance formula:

= (X - )2 N X is the value of an observation in the population m is the arithmetic mean of the population N is the number of observations in the population Population Standard Deviation formula: 2 Variance and standard deviation 1-74 3- 74 In Example 9, the variance and standard deviation are:

= (X - )2 N 2 + ( - 5 .1 - 6 .6 2 ) 2 + ... + ( 2 2 .1 - 6 .6 2 ) 2 ( 8 . 1 6 . 6 2 ) 25

= 4 2 .2 2 7 = 6 .4 9 8 Example 9 continued 1-75 3- 75 Sample variance (s2) 2 s = (X - X ) n -1 2 Sample standard deviation (s) s s

2 Sample variance and standard deviation 1-76 3- 76 The hourly wages earned by a sample of five students are: $7, $5, $11, $8, $6. Find the sample variance and standard deviation. X 37 X 7.40 n 5 2 X X 7 7.4 ... 6 7.4 2 s

n 1 5 1 21.2 5.30 5 1 s s 2 2 5.30 2.30 2 Example 11 4-77 Using the twelve stock prices, we find the mean to be

84.42, standard deviation, 7.18, median, 84.5. Coefficient of variation s CV (100%) = 8.5% X Coefficient of skewness 3 X Median sk s = -.035 Example 2 revisited 1-78

3- 78 Chebyshevs theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least: 1 where 1 k 2 k is any constant greater than 1. Chebyshevs theorem 1-79 3- 79

Empirical Rule: For any symmetrical, bell-shaped distribution: About 68% of the observations will lie within 1s the mean About 95% of the observations will lie within 2s of the mean Virtually all the observations will be within 3s of the mean Interpretation and Uses of the Standard Deviation 1-80 3- 80 Bell -Shaped Curve showing the relationship between and .

68% 95% 99.7% Interpretation and Uses of the Standard Deviation 1-81 3- 81 The Mean of a sample of data organized in a frequency distribution is computed by the following formula:

Xf X n The Mean of Grouped Data 1-82 3- 82 A sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing. Movies frequency

class f showing midpoint X 1 up to 3 1 2 (f)(X) 2 3 up to 5 2 4 8 5 up to 7 3

6 18 7 up to 9 1 8 8 3 10 30 9 up to 11 Total

10 66 X 66 X 6.6 n 10 Example 12 1-83 3- 83 The Median of a sample of data organized in a frequency distribution is computed by: n CF Median L 2 (i ) f where L is the lower limit of the median class, CF is

the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval. The Median of Grouped Data 1-84 3- 84 To determine the median class for grouped data Construct a cumulative frequency distribution. Divide the total number of data values by 2. Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value. Finding the Median Class 1-85 3- 85 Movies showing

1 up to 3 Frequency 1 Cumulative Frequency 1 3 up to 5 2 3 5 up to 7 3 6 7 up to 9

1 7 9 up to 11 3 10 Example 12 continued 1-86 3- 86 From the table, L=5, n=10, f=3, i=2, CF=3 n 10 CF 3 Median L 2 (i ) 5 2

(2) 6.33 f 3 Example 12 continued 1-87 3- 87 The Mode for grouped data is approximated by the midpoint of the class with the largest class frequency. The modes in example 12 are 6 and 10 and so is bimodal. The Mode of Grouped Data