1 MATCHING, STRATIFICATION, REGRESSION Heejung Bang, PhD UC-Davis 2 Rimm & Bortin (1978). Clinical trials as religion 3 Motivating episode A surgeon came to me his aim/hypothesis is clear.

Excel dataset has small N & few variables. I judged multiple regression is a way to go. Surgeon honestly said Previous authors already used multiple regression. To publish, I need to use PS. If not, I may give up. After that, I say: We need PS-match.com and we live in PS nation. 4 Lets be honest Perfect world: If we have ~perfect (e.g. well-designed & conducted, long term) RCT, we dont need obs study to compare A vs. B.

vs. Even with RCT, we still get average causal effect, not individual or effect on me. me - Why should I care about others or average health? - The Median isn't the Message Gould In real life, everyone takes A (or B) in different manners. Alternatives in more perfect world: time machine or avatar/clone + patience 5 Pros & Cons (as always!) RCT Observational study

Causality (glorious) Association/Correlation Expensive or very expensive Expensive or cheap Tons of rules & checklist (+/- audit) Less strict, more freedom Experimentation (no matter what they say) Coin determines my treatment

More naturalistic I and/or Doctor determine my treatment Protocol/Registration (less flexible) Not yet required (good & bad) Analysis: Simpler Less simple Blinding, noncompliance, dropout, Hawthorne More freedom for patients/docs/analysts/publication

High stakes, esp., pharma & nutrition supplement Small or high financial gain (e.g., Costco vitamins, Starbucks) More loved by top journals/WHO. Practice can change immediately Slow but possibly wider impact in daily life 6 Causal vs. Association Causal

1 population Association 2 populations The only person you should ever compete with is yourself. You can't hope for a fairer matchRuthman causal: biological mechanism association/correlation: data relationships devoid of mechanism causal: me 7 Nearly causal (n=1.5)? 8

Classic vs. Modern Matchmakers Match/Stratify/Regress Less adjustment If 2 groups are close Tall-dark-handsome PS/IV/MSM/SNM More adjustment If 2 groups are different Total score 9 Why Match/Stratify/Regress All about: Comparability* & Fairness

(feasibility & price) Power of Coin: - Coin solves so many problems (at baseline) and make statisticians life easier. - Causal method must be fancy and complicated? in 30 games in 2016 vs. Network-meta Confounding vs. Confounder* PS may be a modern broker for these methods under 1 roof. * I hate definitions.. Disraeli personal: My admiration for RCT has been dampened 10

(Unbiased) effect of Veneers & Diet? 11 Matching Epi 101, after Restriction Easy to say, not easy to do; See match.com. 1:1, 1:4? Case-control (as well as cohort) Key questions: 1.Should I match. at all? 2.How/what to match. How many. 3.How to analyze 4.Price of matching 5.age/sex/race: Default? Match b/c others do? 6.For Efficiency more than Validity

7.Design & Analysis should go hand in hand (if possible). e.g., M-H, conditional logistic, NcNemar, paired-t 12 Matching: Beyond textbook Matching can be ignored in analysis? Only for cohort? Over/Under/Useless matching. How about Accidental matching? - Perfect is the enemy of Good. Partial matching better? - When ORconfnder-Y is weak to moderate, matching -> no benefit or even slight harm. PS+matching marginal effect

Easier in the Registry Era; Era more resources but more room for malpractice? Big difference in matching for cohort vs. case-control (common epi mistake made by biostatisticians) 13 Example: Carpal Tunnel Syndrome The objective of Sex Differences in Musculoskeletal Conditions Across the Lifespan .in diagnosis & treatment for both men and women and lead to improvements in womens health. P50 AR063043

http://www.ucdmc.ucdavis.edu/musculoskeletalhealth/projects/scor.html a hypothesis: The cross-sectional area of the median nerve in individuals of both sexes with and without CTS will differ, when adjusted for age, sex & BMI. 14 - What is known: sex, age & BMI are important.

Key objective: Sex-difference --- thats why funded. Issues in practice: Research question important & robust? PI and statistician change What to do after 3 yrs of screening/enrollment? Womens Right vs. Balance in Ns (where are Male cases?!) Sex=match, Age & BMI=regressor? *When Sex Case vs. Control* N

F Control 50 M Control 38 F Case

60 M Case 22 I use a word, it means just what I choose it to meanneither more nor less. Humpty-Dumpty 15 Stratification Matching is a form of stratification a matched set = a stratum, e.g., twins

We also learn from Stat 202: Sampling Idea is so intuitive: If M vs. W are so different, analyze separately (. then combine or not). Cochran-Mantel-Haenszel Useful for heterogeneity & effect modification [Confounding is bad; Modification is good? ---- 1 more paper] Closely related to Standardization, another method toward valid comparison 16 Beware Yule-Simpson Paradox Famous Berkeley sex-bias case Stratified is correct but Unstratified is misleading/useless? ORcrude ORstratified: Simpson or Jensen? I want to see all, including 2x2.

17 Almost Simpson or Paradox nothing Canto JAMA 2011 Yang (JAMA 2012) showed exactly opposite direction. 18 Headline: Blacks are 40% less? Schwartz (NEJM 1999) 19

Regression Calculus:Math Regression:Stat [Cheater may love regression but hate calculus] OLS = King of econometrics OR/Logistic = King of biostatistics Multiple regression: =slope=Y/X, while fixing S1 & S2. Can we adjust 20 Xs? Income vs. education, Variance inflation? OR & p attenuated? Evolution & expansion: GLM/GLM, LMM/GLMM, GMM/GEE, quantile, nonlinear, splines, lowess, Lasso, Bayesian, Cox, stay tuned. 20

Correlation vs. Regression What are the differences? In my QE in 1996, I got 0. I still think the solution was not correct. Mathematically close Correlation/Association Causation: tired? With randomization, fingers-toes, t-test or 2x2 may be sufficient. [If you need statistics, your finding is not significant!] When Genius Errs: R. A. Fisher & the lung cancer controversy (1991) Regression (=Gauss/Legendre/Galton) is older than Correlation (=Galton/Edgeworth/Pearson).

21 What to adjust, how to adjust DAG may help. age/sex/race default again? We adjust b/c they are in the dataset? Or for reviewers? Obsessed to adjust? e.g., direct or indirect Variable selection: - Stat (e.g., by p, computer) vs. Epi (e.g., 10%, p<.20, theory) - teach differently? Independent risk factors, multi-collinearity Rule #1: Do not adjust factors that are affected by X. Dangerously easy? Weaker than matching? Important issue in PS as well

22 Regression: Beyond textbook Adjust or not, e.g., Lords paradox, horse-racing bias ANOVA is a scientific method, ANCOVA is notKempthorne vs. Is ANOVA obsolete? Gelman - Always be wary, when you Co-vary or Time-dependent. - Economists may view a b-o-r-i-n-g topic. - I want to see both! Shall we shrink? 1000 proc logistic in your SAS program? - When to stop adjustment... (until meeting theory or p<0.05?)

- Pre-specify as much as possible and/or honest reporting (wishing minimal penalty) 23 All you need to know: logistic? When can odds ratios mislead? Odds ratios should be used only in case control studies & logistic regression analyses Deeks, BMJ 1998 <10%, OR~=RR. Can convert to other measures (risks, RR, RD).

Guidance on their interpretation is of more use than outright rejection Davies et al. 24 Dark side of your fav stat? (2013) Boos & Stefanski 25 (Clustering) By doc, hospital, community, or CRT

--- We are the world Longitudinal/Repeated measures/Spatial Mixed model, GEE, conditional logistic, etc. Within vs. Between effect: Within is more right. Population vs. Subject: Jensens inequality Clusters Strata? Wgt/Strata/Cluster important in complex survey design*. *Note: a lot of what we know/use are true for simplest design. 26 Propensity Score (PS) What is PS? Why do we want or need? Logistic regression for PS: most popular PS use via match/strata/reg/wgt

Tons of guides available Match.com: If 2 conditions, use traditional matching. If 10 conditions, use PS matching. If 100 conditions, membership refund? Do you even know what "propensity" means?... C. Rock (Everybody Hates Chris) 27 Seeing is Believing! 2012) Weintraub (NEJM

28 Regression & PS: Beyond textbook High-dimensional, e.g., claims data, adjusting 500 Xs - Among 500 potential confounders, mediator, IV, collider? - Over-adjustment? Associated with Y vs. imbalance in A vs. B? > 2 groups, Continuous treatment How to match 30 hospitals Beyond logistic: Machine learning Parsimony vs. model as large as elephant PS for case-control/case-cohort: double beware! beware Double robustness (DR)

29 DR: more harm or good? Demystifying, e.g., 1 0 = +/- Easily wrong twice or more Statistical programming/implementation: - Cross-sectional: easier (or timeless?) - Longitudinal: still difficult Do we have all complete data in right structure? e.g., monotone missingness [Q: Monotonicity is English?] Methods & applications: still evolving/expanding (e.g., DR of ACE on censored costs, 2016).

30 The War of Biases Rule of thumb? Pros vs. Cons Bias1 > Bias2 << Bias3? Direct is Direct? Direct=Total-Indirect? Do not adjust: factors affected by X; mediator; collider; highly collinear vs. Adjust: confounders; when you want "net" effect Oops, Oops I only have one observation per person.

We pray all biases cancel each other? Equivalence of the mediation, confounding & suppression effect MacKinnon 2000 31 Common Complaints/Feedback Causal methods less transparent. Traditional vs. newer: provide similar results. Why we need

newer/advanced methods? Even experts dont agree (including Rubins advice on PS). Some methods: too difficult to understand or implement IV & DAG: our students dont know but speaker assumes they know Causal methods are not in standard curricula; hardcore stat dismiss. Casual inference, Causal interference, Inverse probability (Bayes, 1837) What Effect Is Really Being Measured? 32 Lies, damn lies or Cookbook

nonsense? Different methods Different results p-value/R2/CC/AUC/AIC, the many options of adjustment of potential confounders, multiple testing/modeling: e.g., p & R2; R2& AUC; AIC vs BIC; pseudo R2 Bonferroni vs. FDR AUC=next king p-value? higher & higher P-value of p-values? 46656 Varieties of Bayesians ... Good(1971) How to guarantee significance ... Mantel(1976) Are statistical contributions to medicine undervalued? Breslow(2003) Is statistical method of any value in medical research?... Greenwood(1924) Publication as prostitution Frey(2003)

33 https://datascience.nih.gov/bd2k Not even secondary; or already collected somewhere N/power, Sampling, Causal: no more? A rose by any other name: Data science, etc. Cook Data analysis vs. Statistics. Tukey Data cleaner (for Dirta) = new profession Beautiful Visual, Google over Newton? Newton Examples: EMR, Google flu, Coupons & best restaurants Do we know more about Pluto than Indian ocean? p-value is still .

(when no tiger, rabbit is king.) e.g., birth month & insect bite: adjusted p=0.001. The Unbearable Lightness of information/data? 34 No shortcut to good performance - mg/dL (CDC) vs. mg/L (AHA) - Sorry! kg vs. pound - Oops! there is no age. I will have to enter. - hemoglobin vs. hematocrit - Many paid $999 on Dec 31! Vload=20. - Excel Depression & Holy coding error, Batman Krugman - FOOLED by typo or IT? - Missing, mismeasured, runaway, fake, repeats, frozen - An idiot with a computer is often more powerful than a

statistician with a pencil. Confuseus - New method = click-click or http://www.automaticstatistician.com/ 35 Yes, I am biased more fundamental progress is more likely to be made by very focused, relatively small scale, intensive investigations than collecting millions of bits of information on millions of people,. collect such large data now, but it depends on the quality, which may be very high or not, and if it is not, what do you do about it?... D. R. Cox My Little EXCEL

Cult of Bigness, Small is Beautiful Kohr/Taleb Big will be Bigger as we aint getting younger. Stat/epi are slow in BigData game/war Causal Inference from big data . Bareinboim-Pearl (2015) (btw, do you have BigData?) 36 Cereal & baby boy: Fooled by Randomness? 37 Same basic principles; small or

large Measure what is measurable; make measurable what is not so Galilei No free lunch.or Garbage-I-G-O/BIBO Multiple testing, multiple modeling, IN-N-OUT - We are Aimless Fishermen? Do you remember your data? Data sharing/transparency/look for other evidence Can we escape from theory & math? Can you publish ~50% of newborns are boys? =50% (not 5%) Era: already? We reject important confirmations. No-one is incentivised to be right. Scientists are incentivised to be productive & innovative ... Horton

38 BigData vs. n-of-1 Antonym? Dear Both, You Clean. Clean Both are Personalized, Precision medicine? How about n-of-3 or magic number 30? - n=1 is not acceptable, and n3 is preferable.. Editor of Science Signaling - What I tell you 3 times is true Lewis Carroll - Rule of Three Darwin, Pearson/Biometrika - n=30 was from correlation & t-test (Student 1908a,b)

39 Secret, Hostage or Monopoly Science? Data are cheap-N-fair? Rich get richer? 40 Kilimanjaro or Pluto? Are we p<0.05? Publish or perish? log(100)=2