Remedial Procedure in SLR: transformation Overview of remedial measures If the simple linear regression model is not appropriate for a data set Abandon regression model and develop a more appropriate model Employ some transformation on the data so that regression model is appropriate for the transformed data Nonlinearity of regression function Transformations Non-constancy of error variance Transformations and Weighted least squares Non-independence of Error terms Autocorrelation, time series analysis Non-normality of error terms Transformations

Outliers Transformations or Robust regression Transformation for a better linear model: transform Y With simple algebra, we can transform both predictors and response to make some nonlinear equation into a linear form. Consider the log-linear model, Y = 0 e1 X + We can form a linear model taking log of both sides: Y ' = 0I + 1 X + 1 X +

log(Y ) = log(0 ) + Note that assumptions about the error term change with transformations. or =log ( ) ^ log ( ) Which does not follow Normal, and cannot be diagnosed as usual.

Transformation for a better linear model: transform X The linear in linear regression means linear with respect to the parameters, not with the predictors. We can transform the predictors to make many non-linear functional forms as linear model. Such as Y = 0 + 1 X + 2 X 2 + Y = 0 + 1 log(X) +

The error term does not change with X transformations, diagnose as usual ^ = Prototype nonlinear regression patterns with constant error variance and transformation X Comment If some of the X data are near 0 and

reciprocal transformation is desired. Shift the origin by Where is an constant Should not transform Y because it will Effect the distribution of the error terms. Prototype nonlinear regression patterns with unequal variance and Non-normality of error and transformation Y (and X, if necessary) Comment Shapes and spreads of distribution of Y need to be changed

Can be combined with transformation on X if Y is negative When unequal error variance are present but still linear. Both Y and X transformation may be required Box-Cox Procedure Transformations on Y sometimes help with variance issue: non-normality and non-constant.

Box-Cox considers a family of so-called power transformations, where, Y = Y then Y Suggests transformations to try. Works by using the method of maximum likelihood to find the value of that produces the best (transformed) regression. Scatter and residual plots should be utilized to examine the appropriateness

The Plasma example Age (X) and plasma level of a poly amine (Y) for a portion of the 25 healthy children are studied. Scatter plot shows there is greater variability for younger children than for older ones Check Normality and constancy on the residuals Residual plot Confident interval band ^h ^ { }

h Where ^h { } Where After a careful exam on the experiment procedure, no mistake has been found, hence we should keep this observation.

Box-cox procedure The best The best -1.1 0.1 = 0 + 1 , h 3 3, 0.1

Transform Because s{residual} has the same unit as the response variable, but transformation alters that. Before After Why the s(residual) is not comparable? Re-Check Normality and constancy on the residuals Residual plot

Before After Back-transformations Transformations can improve model performance, but make interpretation hard. Back transformation lets us make inferences (and graphs!) on the original scale. Very helpful for communicating results to the public.

Back-transform the end products of the analysis In general, let and let be the back-transformation function. Then: () is the CI for For example, the back-transformation function does , so ) is the PI for Yh(new)

In transformation on X, inverting is simpler because usually Back transforming the parameter estimates is often quite complicated. ^ = ( )+ ( ) X . . , 0 1 Back-transform 1. The back transform function 2. The predicted value should be

3. The confidence interval for the prediction, either for the mean or single response, should also be back transformed with . 1 =0 .268 +0 . 04 ( ) Back-transform Summary of remedial measures For nonlinear functional relationships with well behaved residuals

Try transforming X May require a polynomial or piecewise fit (we will cover these later) For non-constant or non-normal variance, possibly with a nonlinear functional form Try transforming Y The Box-Cox procedure may be helpful If the transformation on Y doesnt fix the non constant variance problem, weighted least squares can be used (we will cover this later).

Transformations of X and Y can be used together. Any time you consider a transformation Remember to recheck all the diagnostics. Consider whether you gain enough to justify losing interpretability. Reciprocal transformations make interpretation especially hard. Consider back-transforming the results of the final model for presentation. For very non-normal errors, especially those arising from discrete responses, generalized

linear models are often a better option, but linear regression may be good enough. Transformation our primary tool to improve model fit Always repeat diagnostic process after transformation