Correlation and Covariance Goals Introduce concepts of Covariance Correlation Develop computational formulas
Covariance Variables may change in relation to each other Covariance measures how much the movement in one variable predicts the movement in a corresponding variable Smoking and Lung Capacity Example: investigate relationship between cigarette smoking and lung capacity Data: sample group response data on smoking habits, and measured lung capacities, respectively
Smoking v Lung Capacity Data N Cigarettes (X ) Lung Capacity (Y ) 1 2 0 5 45 42 3 10 33 4 15 31 5 20 29
Smoking and Lung Capacity
Smoking v Lung Capacity Observe that as smoking exposure goes up, corresponding lung capacity goes down Variables covary inversely Covariance and Correlation quantify relationship Covariance Variables that covary inversely, like smoking and lung capacity, tend to appear on opposite sides of the group means
When smoking is above its group mean, lung capacity tends to be below its group mean. Average product of deviation measures extent to which variables covary, the degree of linkage between them The Sample Covariance Similar to variance, for theoretical reasons, average is typically computed using (N -1), not N . Thus, 1 N S xy Xi X N 1 i1 Y Y i
Calculating Covariance Cigs (X ) 0 5 10 15 20 Lung Cap (Y ) 45 42 33 31 29 X 10 Y 36
Calculating Covariance Cigs (X ) ( X X ) ( X X ) (Y Y ) (Y Y ) Cap (Y ) 0 -10 -90 9 45 5 10 15 20 -5 0 5 10 -30 0 -25 -70 6 -3 -5 -7 42 33 31 29 = -215
45 5 10 15 20 -5 0 5 10 -30 0 -25 -70 6 -3 -5 -7
Covariance Calculation (2) Evaluation yields, S xy 1 ( 215) 53.75 4
Covariance under Affine Transformation Let Li aX i b and M i cYi d . Then, l i a x i , m i c y i where, u i ui u . , Evaluating, in turn, gives, N S LM 1 l i m i N 1 i 1
Covariance under Affine Transf Evaluating further, S LM (2) 1 N l i m i N 1 i1 1 N a x i c y i N 1 i1 1 N ac x i y i N 1 i1 S LM acS xy
(Pearson) Correlation Coefficient rxy Like covariance, but uses Z-values instead of deviations. Hence, invariant under linear transformation of the raw data. N 1 rxy zxi zyi N 1 i 1
Alternative (common) Expression rxy sxy sx s y
Computational Formula 1 1 N X iYi sxy N 1 i 1 N N X i Yi i 1 i 1 N
Computational Formula 2 rxy N XY 2 N X X 2 X Y 2 NY Y 2
2 N X X 2 X Y CS5961 Comp Stat 2 NY Y 2
18 Table for Calculating rxy Cigs (X ) = Y2 Cap (Y ) 0 2025 45 25 100 225 400
210 330 465 580 1764 1089 961 841 42 33 31 29 750 1585 6680 180
Computing rxy from Table rxy 5(1585) 50(180) 5(750 50 ) 5(6680) 180 2 2 7925 9000 3750 2500 33400 32400
Computing Correlation rxy 1075 1250 1000 rxy 0.9615
rxy 0.96 Conclusion rxy = -0.96 implies almost certainty smoker will have diminish lung capacity Greater smoking exposure implies greater likelihood of lung damage
