Non-Experimental Design Webinar 1 Presenters: Dr. Michelle Susberry Hill, Dr. Armando Paladino and Dr. Ruzanna Topchyan Research Methodology Group: Center for Educational and Instructional Technology Research 2 Presentation Outline Non-experimental design in a nutshell Dr. Armando Paladino (2 min) Causal-Comparative Designs Dr. Michelle Hill (14 min) Correlational Designs: Dr. Armando Paladino (14 min) Beyond Regression: Dr. Ruzanna Topchyan (14 min)

Questions & Answers (15 min) 3 Introduction to Non-Experimental Designs Experimental vs. Non-Experimental Designs 4 Experimental Non-Experimental Variables: Independent (IV) & dependent variable (DV) Groups: experimental group, control group or placebo Experiment Location: laboratory or controlled environment. Research Purpose: to find out the causation Variable Manipulation: Independent variables can be manipulated. Types of Experiments: (i) controlled

experiments, (ii) quasi-experiments, and (iii) field experiments. Variables: Independent (IV) & dependent variable (DV) Groups: If groups are used, a comparison group rather than a control group Research Location: Natural setting Research Purpose: to compare a situation, people or phenomenon also over a period of time to observe the change. Manipulation of Variables: None Types of Designs: Survey, case studies, correlational studies comparative studies and descriptive studies, Longitudinal studies Purpose of the Studies: Caution: No causation can be established, only a relationship. 5 Causal-Comparative Designs 6 Objectives

Definition Characteristics Uses Steps Involved Statistics Limitations Definition 7 Non-Experimental Designs that investigate Relationships Researchers try to identify the causes of differences that already exists within individuals or groups

3 Types Exploration of Effects Exploration of Causes Exploration of Consequences 8 Characteristics Ex Post Facto Pre-existing Differences or Conditions

Pre-existing groups No control No manipulation Can make reasonable inferences about causation 9 Uses When variables cannot be manipulated

When experiments are not possible Attempts to identify causes or consequences while the assumption of this design is inaccurate and not always true 10 Steps Involved Develop research question Identify the independent and dependent variable Select participants

Collect data from pre-existing data Analyze and interpret data Report findings 11 Statistics Compare averages Use Crossbreak Tables Independent or Dependent T-tests

T-tests for comparison of two groups ANOVA for comparison of more than two groups Chi-square for comparison of group frequencies between groups 12 Limitations There must be a pre-existing independent variable and you cannot manipulate it There is a lack of randomization

Inappropriate interpretations can occur; making it hard to identify cause and effect relationships There are often other variables that affect the dependent variable instead of the independent variable Reversal causation may exist Possibility of subject selection bias Other threats: location, instrumentation, and loss of subjects 13 Correlational Designs: Relational & Predictive 14

When Used Types Characteristics Assumptions Correlational Designs 15 When do we use the design? This design is appropriate for exploring problems about the relationships between constructs, construct dimensions and items on a scale. For example, the age of a child may be related to the height and the adult occupation may be related to his/her income level (Cohen, Cohen, West, & Aiken, 2003) Types: Relational This design is used to identify the existence, strength and direction of relationships between two variables. The synonym of correlation is association, and it is referred to the direction and magnitude of the relationship between two variables. The supporting analysis is correlation.

Predictive This design is used to identify predictive relationships between the predictor(s) and the outcome(s)/criterion variable(s). The supporting analysis is the regression analysis (different types). 16 Correlational Design (cont.) Type of problem appropriate for this design Problems that beg for the identification of relationships or predictive relationships are appropriate for correlational designs. Theoretical framework/discipline background: Correlational research is supported by relational theories that attempt to test relationships between dimensions or characteristics of individuals, groups or situations or events. These theories explain how phenomena, or their parts are related to one another. The theory about the relationship between the constructs was first introduced by Karl Pearson an English statistician and then expanded by Charles Spearman that developed a method to compute correlation for ranked data (Salkind 2010). Correlational Design Characteristics 17

"Correlation does not tell us anything about causation, which is a mistake frequently made when interpreting [the results of the analysis] Some other variables (time available to study, relevance of the material to their job, etc.) probably explain the relationship. And in order to interpret the results of the analysis we need to know the context. Correlation only tells us that a relationship exists, not whether it is a causal relationship (Holton & Burnett, 2005, p. 41). ILLISTRATION Individuals salary level is associated with their test scores, and we found a correlation of -.50 which tells us that people with higher salaries tend to score lower on the test. Does this mean that making more money makes you less smart or that if you do well on tests you will make less money? The answer is NO. 18 Linear Regression Linear regression is used when we want to predict an outcome based on the manipulated conditions from the independent(s) variable(s). Regression is a step forward from correlation since with a good solid sample and design, we can predict outcome with the linear regression equation when the pure correlation will only show the strength of the correlation between two variables measures from -1 to + 1 (from a perfect negative linear relationship to a perfect positive relationship). This should never be confused with cause and effect since two variables can be highly correlated without one causing the other. It is worth to mention that on a solid and strong regression the correlation is implicit. We can have simple regression (One IV variable), multiple regression (Many IV variables)

and Logistic regression where the outcome is dichotomous. lets look at an example. Imagine that I was interested in predicting physical and downloaded album sales (outcome) from the amount of money spent advertising that album (predictor). We could summarize this relationship using a linear model by replacing the names of our variables into equation (8.1): Once we have estimated the values of the bs we would be able to make a prediction (Field, 2013, p. 295) Linear Regression 19 As you can see in the equation Yi is the predicted value (album sales in this case), b0 is the Y intercept, b1 is the slope of the equation and i, the standard error of the estimate. We just need to plug the value of X (advertising budget) to predict album sales. This equation, extends to multiple regression having multiple values of b from bi to bn. In a linear regression we never predict with 100 percent of accuracy and always there is a difference between the predicted value and the measured value and the difference is the standard error. 20 Linear Regression

We can conduct linear regression with interval variables, categorical variables and a combination of both. To be able to use categorical variables on multiple regression we need to create N-1 dummy variables for the system to conduct the regression examining each variable with the other variables held constant. For example, if we have three groups that we want to use to run a multiple linear regression the below table shows the needed coding. Linear Regression Group 21 1 Abused, PTSD 2 Abused, Non-PTSD 3 Nonabused, Non-PTSD

Abuse PTS 1 0 0 NO_PTSD 0 1 0 The regression is regressing false memory (Dependent Variable) against three groups of independent variables represented by Women either abused with posttraumatic stress disorder (PTSD), non-abused with PTSD and non-abused without PTSD Group 1 1 1 1 1 2 2 2 2

2 3 3 3 3 3 Abuse PTS 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 NO_PTSD

0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 Linear Regression 22 The result is that in the multiple regression the first dummy variable will contrast the Abused, PTSD group with the Non-abused (and Non-PTSD) group, whereas the second dummy variable will compare the Abused, Non-PTSD with the Non-abused group. (Keith, 2015). Finally, we can also combine categorical and continuous variables. When using multiple linear regression, we can use three methods to enter the predictors into the analysis: You can use the hierarchical method and you enter variables one at the time or in groups at the time based on past research and hypothesis. You can enter all the variables at once and or use the stepwise

model (one of the SPSS option) (Field, 2017) Assumptions of Regression 23 Linear regression is very demanding in terms of assumptions and it makes sense since we are trying to predict an outcome using a very rigorous collected sample. Let us briefly describe the seven assumptions 1. You must have one dependent that is measured at the continuous level (Example: Hours, salary, sales, revenue etc.) 2. You have one or more independent variables that are either measured at a continuous level, categorical level or a combination (See previous examples) 3. You should have independence of observations. If you do not, then a different statistical test would be required like repeated measures ANOVA 4. There needs to be a linear relationship between (a) the dependent variable and each of your independent variables, and (b) the dependent variable and the independent variables collectively 5. Your data needs to show homoscedasticity of residuals. That is your residuals must show equal variances as you move along the line of best fit 6. Your data most not show multicollinearity. This happens when you have two variables that are highly correlated to each other (For example age and experience predicting the same thing. See multicollinearity example below) 7. There should be no significant outliers, high leverage points or highly influential points Multicollinearity Example 24 B

Std. Error Beta (Constant) -60.890 16.497 Age (Years) 6.234 1.411 .942** Number of Years as a -5.561 2.122 -.548*

-.196 .152 -.083 Model Attractiveness (%) Note: R2 = .18 (p < .001). * p <.01, ** p <.001 The significance of age (denoted by the two asterisks at p < .001) showed a positive relation between years and salary prediction. Age was also inversely related to the number of Years as a model, indicating that the more years they work as a model, the less salary they would expect. An explanation for this is found on the table below that shows the correlation between age and experience. This explains the reason why experience shows negative values, since they both are reporting the same thing. This is related to multicollinearity. Note** See the strong correlation between Age and Experience (.98 and .91) One possible solution would be either combining both variables and/or removing one (I would remove age) Eigenvalue Condition Index Age (Years)

3.925 .070 .004 .001 1.000 7.479 30.758 63.344 .00 .00 .02 .98 Number of Years as a Model .00 .08 .01 .91 Attractiveness

(%) .00 .02 .94 .04 Different Types of Regressions 25 More than 20 types of regression analysis exist ranging from simple regression that uses one predictor and one dependent variable to multivariate multiple regression that uses more than one predictor and more than one outcome variable. While the list below might not be exhaustive, it will provide some idea about the multiplicity of the types of analysis: Linear Regression Logistic Regression Multiple Multivariate Regression

Polynomial Regression Stepwise Regression Multiple Regression Simultaneous Multiple Regression Sequential Multiple Regression Stepwise Multiple Regression 26 Beyond Regression Path Analysis 27 When used - When we are interested in obtaining more detailed information about different effects in the model Sample size Similar to regression analysis e.g. N = 30 + 10K or using G*Power N=85, power =.80,

effect =.15 Variable Types Interval-ratio level of measurement Specific Characteristics Allows to identify direct, indirect, and total effects of independent variable on the dependent variable. Allows to make cause and effect inferences because the paths are drawn without reference to correlations. Cause and effect inferences are made based on: (i) theory, (ii) time precedence (e.g. family background & ability precede motivation and academic coursework), (iii) previous research, (iv) logic (or combination of logic, observation, understanding and common sense). Three Conditions for Inferring Causality (i) there should be a relationship between the variables; (ii) the presumed cause should have time precedence, (iii) the relationship between the variables should be true rather than spurious. Terminology Presumed causes exogeneous variables; presumed effect endogeneous variable Statistical Software - Using AMOS, STATA, R, MPlus Note: Causality does not work backwards. Multiple Regression vs. Path Model 28 Multiple Regression Simultaneous regression direct effects of IV on DV Sequential Regression total effect of IV on DV Path Analysis Direct, indirect and total effects of IV on DV. Adopted from: Keith (2014)

Note: On path diagram squares are measured variables, circles are unmeasured variables (disturbances) Path Analysis Results 29 Adopted from: Keith (2014) Variable Chi-Square = 69.609 df = 4 GFI =.974 TLI = .797 CFI = .919 RMSEA = .129 AIC = 91.609 PGFI =.260 Direct Effect Indirect Effect Total Effect Academic Coursework

.310 ___ .310 Motivation .013 .083 .096 Ability .551 .131 .682 Family Background .069

.348 .417 Assumptions of Path Analysis 30 Basic assumptions of multiple regression analysis 1. Linear function between dependent & Independent variable 2. Each case is drawn independently from the population 3. Normal distribution of errors for all independent variables Additional Assumptions 4. No reverse causation (recursive model)

5. Exogeneous variables are reliable and valid measures 6. The causal process has had a chance to work 7. All common causes in the model are included Terminology Exogeneous variable presumed cause Endogenous variable presumed effect Adopted from: Keith (2014) Confirmatory Factor Analysis 31 When used For construct validation & for revalidation of the already validated instruments. For this also Exploratory Factor Analysis can be used.

Sample size On average at least 10 samples per item: e.g. if the questionnaire has 15 questions (non-demographic), then at least 150 samples. Variable Types Interval-ratio level of measurement, although nominal level of measurement (e.g. gender, ethnicity) can also be used to compare models. Specific Characteristics Allows identifying whether the tested model fits the data well. Uses residuals (errors) to identify the model fit. Uses modification indexes to identify model fit. Confirmatory Factor Analysis (CFA) Model 32 Coefficients Used Chi-square - classical fit index - sensitive to sample size. p-value <.05 CMIN/DF - the ratio of minimum discrepancy to degrees of freedom (has different levels) AIC - a cross-validation index which tends to select models that would be selected if results were cross-validated to a new sample. CFI & TLI - CFI evaluates the fit of a user-specified solution in relation to mode restricted baseline model, TLI compensates for model complexity. A level of .90 and above is a good index. PGFI - measures parsimony in Structural Equation Modeling RMSEA - root mean square error of approximation. Very similar to pvalue. RMSEA <.05 shows a good fit. SRMR - root mean square residual is the average discrepancy between correlations observed in the input matrix and the correlations predicted by the model.

Adopted from: Topchyan (2013) 33 Exploratory Factor Analysis (EFA) Pattern Matrixa 1 Factor 2 3 0.732 0.923 0.612 0.396 0.417 0.396 0.730 0.830 0.740 0.319 4 Information Seeking1 Information Seeking2

Information Seeking3 Pleasant Sensation1 Pleasant Sensation2 Pleasant Sensation3 Sustained Strong Ties1 Sustained Strong Ties2 Sustained Strong Ties3 0.327 Extended Weak Ties1 0.656 Extended Weak Ties2 0.853 -0.330 Extended Weak Ties3 0.689 Sense of Community1 0.944 Sense of Community2 0.546 Sense of Community3 0.686 Sense of Community4 0.643 Sense of Community5 0.616 0.442 Sense of Community6

0.616 Extraction Method: Principal Axis Factoring. a. Rotation converged in 6 iterations. When used For reducing dimensions. Sample size On average 10 samples per item. Construct Strength A construct with 3 and more indicators is considered strong. Variable Types Interval-ratio level of measurement. Specific Characteristics Allows test construct structure (even if the scales are validated). Latent Variables Structural Equation Modeling (SEM) 34 When used For validating models, vor comparing models by multiple groups. Sample size At least 200 samples. Variable Types Interval-ratio level of measurement, although nominal level of measurement (e.g. gender, ethnicity) can also be used to compare models. Specific Characteristics (i) Allows identifying whether the tested model fits the data well. Uses residuals (errors) and modification indexes to identify the model

fit, (ii) includes confirmatory factor anlaysis as well as a path analysis of the effects of one latent variable on another. Adopted from: Topchyan (2013) Latent Variables SEM (Cont.) 35 Similar to Path Analysis it is possible to calculate the direct, indirect and total effects. It is also possible to test the model with multiple groups Adopted from: Topchyan (2013) References 36 Bevins, T. (n.d.). Research Designs. Retrieved from http://ruby.fgcu.edu/courses/sbevins/50065/qtdesign.html Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple correlation/regression analysis for the behavioral sciences. UK: Taylor & Francis.

Coolican, H. (2014). Research methods and statistics in psychology. London: Psychology Press, Taylor & Francis Group. Field, A. (2017). Discovering statistics using IBM SPSS statistics. sage. Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2016). How to design and evaluate research in education. McGraw-Hill Education. Holton, E. F., & Burnett, M. F. (2005). The basics of quantitative research. Research in organizations: Foundations and methods of inquiry, 29-44. Iichaan. (2015, June 27). Weaknesses and Disadvantages of Causal Comparative Research Essay. Retrieved from http:// www.antiessays.com/free-essays/Weaknesses-And-Disadvantages-Of-Causal-Comparative-750679.html Keith, T. Z. (2014). Multiple regression and beyond: An introduction to multiple regression and structural equation

modeling. Routledge. Miles, J., & Shevlin, M. (2001). Applying regression and correlation: A guide for students and researchers. Sage. Nayak, B., & Hazra, A. (2011). How to choose the right statistical test? Indian Journal of Ophthalmology, 59(2), 85. doi:10.4103/0301-4738.77005 Salkind, N. J. (Ed.). (2010). Encyclopedia of research design (Vol. 1). Sage. Topchyan, R. (2013). Factors affecting knowledge sharing in virtual learning teams (VLTs) in distance education (Doctoral dissertation, Syracuse University). 37 Additional Information For more detailed information on non-experimental designs visit:

https://research.phoenix.edu/content/research-methodology-group/correlational-design