GCGCATATGCGCATG hello world My education 2006, B.S., Peking University Biological Sciences
2012, Ph.D., University of Michigan Evolutionary Genetics Course introduction Applied biostatistics Examples, examples, and examples
Try to make it not too heavy Statistics Statistics is the study of the collection, org anization, analysis, interpretation and pres entation of data.
March 21: Probability March 23: Introduction to R March 30: Hypothesis testing Shaohuan Wu
April 1: Analysis of variance April 6: Regression and correlation April 11: Plots with R April 19: Presentations (+ a report = final e xam) R language
Standard statistical tool in science You will need to bring your laptop to the cl ass, with R installed. Download R
http://www.r-project.org/ R studio http://www.rstudio.com/ Exam
Final exam is a report based on the use of statistics in a small project. The report sho uld be 1000 - 2000 words. Ten-minute (including 2 min Q & A) oral de fense of the report in front of the class. PPT
Will be uploaded to my lab website after e ach class qianlab.genetics.ac.cn Words in red: waiting for your response Words in green: the beginning of a new ex ample
Textbooks Textbook Handbook of Biological Statistics www.biostathandbook.com/HandbookBioStatThird.pdf An R companion for the handbook of biological stati
stics rcompanion.org/documents/RCompanionBioStatistics.pdf Other reference: Biometry by Sokal & Rohlf
What is a p-value anyway? By Andrew Vickers Before next class Handbook of Biological Statistics Any relevant materials on pages 1-28 (before class II)
An R companion for the handbook of biolo gical statistics Pages 1-13 (before class II) Your introduction
Statistics is the base of most sciences The definition of the modern science? What is science? A theory in the empiric al sciences can never be proven, but it can b
e falsified, meaning th at it can and should b e scrutinized by decisi ve experiments. Hypothesis testing Karl Popper 1902-1994
All swans are white Science is about rejecting null hypothesis Aristotle
Galilei Leaning Tower Pisa Science is about rejecting null hypothesis
Einstein Eclipse In biology In genetics Mixing of traits
Mendelian genetics Mendel Two copy of genes that can be separated in th e next generation, generating the 3:1 ratio
Other examples? Deterministic vs stochastic events Deterministic events If I toss a coin, I will get a face up
I will get up in the tomorro w morning A child will grow up Stochastic events Head or tail? The exact time point (min
ute and second) I would wake up naturally The height and weight of t he child Other examples? Phenomena in biology
Are likely to be stochastic, compared to ph ysical phenomena In physical world Sun rises Planet moves Water boils
In Biology Weight and height
Disease Life span The outcome of your exam Reason? Reasons of stochasticity in life
Traits are determined by both genes and en vironments Environment is heterogeneous Most traits are affected by multiple genes Each gene has a minor impact Developmental strategy (body plan)
Life sciences contains a huge number of fact ors, which makes stochasticity everywhere. How do we describe stochastisity? Distribution! Density function
Density function Cumulative density function Normal distribution
The bell shape Appears everywhere in biology Why? Traits are determined by both genes and environ ments Many genes with minor effects Additivity
What if not? The probability of a person taller than 1.9 meter If the distribution of height follows normal d istribution, with mean = 1.75 and standard
deviation = 0.06 Descriptive statistics Algebraic Mean ()) Variance (2) Standard deviation ()
Normal distribution The probability of a person taller than 1.9 meter If the distribution of height follows normal d istribution, with mean = 1.75 and standard
deviation = 0.05 P = 1- NORMDIST(1.9, 1.75, 0.05, 1) =0.6% The height is more than 1.9 meter If the distribution of height follows normal d istribution, with mean = 1.75 and standard
deviation = 0.05 What is the probability of less than 1.2 met er? The height is more than 1.9 meter If the distribution of height follows normal d istribution, with mean = 1.75 and standard
deviation = 0.05 What is the probability of less than 1.2 met er? What if this number is different from your i ntuition? The probability of a person taller than 1.9
meter If the distribution of height follows normal d istribution, with mean = 1.75 and standard deviation = 0.05 What is the probability of being between 1. 7 and 1.75?
Can you draw A density curve of standard normal distribu tion in Excel? A cumulative density curve of standard nor mal distribution in Excel? Bill Gates visit to a bar
Median Bill Gates revisit to the bar Interquartile range Boxplot
How do we treat stochastic data At a summer tea party in Cambridge, Engl and, a guest states that tea poured into mil k tastes different from milk poured into tea. Her notion is shouted down by the scientifi c minds of the group. But one man, Ronald Fisher, proposes to s
cientifically test the hypothesis. How to test the hypothesis? H0: There is not difference on order of milk and tea How to test the hypothesis?
H0: There is not difference on order or milk and tea 10 cups of drink Mixed blind to the lady Let the lady tell the order of milk and tea If H0 is correct, what is the probability the l ady get all 10 guess correct?
How to test the hypothesis? If H0 is correct, what is the probability that t he lady got all 10 guesses correct? How to test the hypothesis? If H0 is correct, what is the probability the l
ady get all 10 guesses correct? 0.1% It is unlikely that event with such low proba bility happened in a single test. Thus, the most likely scenario is that H0 is incorrect, and there is difference between two orders .
What is a P-value? The p-value is defined as the probability of obtaining a result equal to or "more extrem e" than what was actually observed, assu ming that the model is true. P-value can be used in statistics to reject a null hypothesis
What if Among 10 tests, the lady succeeded for 8 of them? Binomial distribution
First child, Boy or Girl Second, B or G Third, B or G
Eight possibilities: BBB, BBG, BGB, BGG, GBB, GBG, GGB, GG G What is the probability of having 2 B in 3 c hildren?
Binomial distribution n=3 k=2 p=0.5 What if
Among 10 tests, the lady succeeded for 8 of them? What is the p-value? Probability estimation Alternatively, we can estimate the probabili ty of success (E)
In this case 80% We can get 95% confidence interval (CI) If 0.5 is out of CI, we conclude a difference between the order Confidence interval
How to calculate confidence interval? For binomial distribution, Variance Standard deviation In this case, = sqrt(10 * 0.8 * 0.2) = 1.26
If we use normal distribution to approximate the binomial distribution 95% confidence interval = [-2, +2])-2, )+2] =[-2, +2]8-2.5, 8+2.5] = [-2, +2]5.5, 10.5] 5 is out of the 95% confidence interval Law of large number
The estimate of the probability 0.8 may not be accurate The larger the sample size, the more accu rate our estimate is. So that we could potentially distinguish 50 % from 60%
Applications of such idea Hold your nose, and you may not be able t o tell coke from sprite Is a drug effective or not? Other examples?
Number of left handed people If the probability of left handed people is 5 % in a population, what is the probability of a 50-student class containing exact 1 left h anded people? Poisson distribution
How about 0, 2, 3, 4 left handed people? Application: when the total # is not available Uranium(235U) radiation Neutron Rate 1/sec The probability of having exactly 1 radiatio
n event in the next sec? Luria-Delbrck experiment Question: Did the mutation to resistance happen BECAUSE of the presence of a virus, or even BEFORE adding the virus to the culture?
Poisson distribution Luria-Delbrck distribution Luria-Delbrck experiment
Intuition is extremely important in statistic s Blaise Pascal 1623-1662
Pascal's principle Geeks joke One day, Einstein, Newton, and Pascal meet u p and decide to play a game of hide and seek. Einstein volunteered to be It. As Einstein cou nted, eyes closed, to 100, Pascal ran away an
d hid, but Newton stood right in front of Einstei n and drew a one meter by one meter square o n the floor around himself. When Einstein open ed his eyes, he immediately saw Newton and s aid I found you Newton, but Newton replied, Einstein, Newton, and Pascal Play Hide a
nd Seek No, you found one Newton per square me ter. You found Pascal!. Pascals Problem The rule of the game
Two people toss the coin one by one Player A wins when s/he gets 3 head Player B wins when s/he gets 3 tail The game has to stop when A gets 2 head a nd B gets 1 tail because of Kings call How to split the bet?
Opinions B: A gets 2/3 and B gets 1/3 A needs one more head, P = 1/2 B needs two more tails, P = 1/4 A: A gets 3/4 and B gets 1/4
B wins only when B gets two tails P = 1/4 Otherwise, A wins P = 3/4 Who is correct? Conclusion
A: A gets 3/4 and B gets 1/4 Monty Hall problem Suppose you're on a game show, and yo u're given the choice of three doors: Behin d one door is a car; behind the others, goa ts. You pick a door, say No. 1, and the host
, who knows what's behind the doors, ope ns another door, say No. 3, which has a go at. He then says to you, "Do you want to pi ck door No. 2?" Is it to your advantage to s witch your choice? Your guess?
Monty Hall problem If the car is not behind door 3, the probabili ties of being behind door 1 and door 2 are equal P = for both. Solution 1
1/3 1/3 1/3 Solution 2
Intuition: Consider 10000 doors You chose door 1 The host open 9998 doors for you, and no ne of them have cars behind Do you switch?
Monty Hall problem Switch it! The probability of the same birthday in a class Consider a class with 50 people What is the probability that at least two stu
dents have the same birthday? Your guess? The probability that all have different birth day
The first person: 1 The second person: 364/365 The third person: 363/365
The 50th person: 316/365 P = 0.03 The answer The probability that all have different birthd
ays P = 0.03 The probability that at least two students h ave the same birthday 1 P =0.97 The success of an experiment
Two people A and B are doing an experim ent in my lab According to the history records, the succe ssful rate for A is 0.8, and that for B is 0.7 Each of them does the experiment once What is the probability of at least one succ ess?
The success of an experiment Consider the probability both of them fail P = 1- 0.2 * 0.3 = 0.94 The success of an experiment Consider the probability both of them fail
P = 1- 0.2 * 0.3 = 0.94 Any problems here? The success of an experiment
Consider the probability both of them fail P = 1- 0.2 * 0.3 = 0.94 Any problems here? It depends on whether the two people are
doing experiments independently! Do they use the same set of reagents? If true, then As failure increases the probabilit y of Bs failure The conditional probability P(A|B)
The probability of A given B The probability of girl given the first child is a boy in the family P(the second child is a girl | the first child i s a boy) If independent P (2nd girl | 1st boy) = P (girl)
Probability of infection A test can detect 95% of the people with in fection (true positive) There is 1% probability of false positive The frequency of a infection is 0.5% What is the probability of infection, given a positive result in the test
Bayesian theorem Ai = infected B = positive in the test P (Ai | B)
Autosomal single-locus disease Patients ? Normal individuals
Autosomal single-locus disease Patients ? Normal
individuals The probability of 4th girl in the family, give n the first 3 are all girls Your opinion? Genetics or stochasticity
Model I: for some genetic reasons, only sp erms with X chromosome survive. Model II: the birth of sons and daughters a re equally likely For a family with 3 daughters, which model is more likely?
Genetics or stochasticity Model I: for some genetic reasons, only sp erms with X chromosome survive. Model II: the birth of sons and daughters a re equally likely How to calculate it quantitatively?
The film focuses almost entirely on his personal side, rather than his career in public service. Clinton is portrayed in his '92 campaign film as a "true believer in the American system who values government and public service and did...
But when he met a gorilla named Kate, they took to each other at once. When it was thought that Kate was unable to reproduce, because of her advanced age, zookeepers decided to send Timmy to another zoo, where he...
Brown (1983) Searches and Seizures Not Fully Protected by the Fourth Amendment: Plain View, Open Fields, Abandonment, and Border Searches The Plain View Doctrine Situations in Which the Plain View Doctrine Applies Arrest Hot pursuit Search incident to a valid...
Population growth rate(λ) = 1.6, generation time = 2.14. Issues of temporal scale. If habitat restoration has a population level effect, we would not expect to begin seeing any real change until the expected generation time is reached.
Not all cells are alike. Even cells within the same organism show enormous _____ in size, shape, and internal organization. Your body contains over _____ different cell types. Cell Shape. Cells come in a variety of specific shapes.