Yang Song HOW TO LIE WITH PROBABILITY Outline of Talk 1 2 3 Simple Examples Averages and Expectation

Random Walks How to Lie with Probability 1 Simple Examples How to Lie with Probability Monty Hall Problem You are on a game show with three doors. One of the doors contains a car

while the other two contain monkeys. You pick a door and the host reveals a DIFFERENT door which contains a monkey. The host gives you an option. Stay with your original choice, or swap to the third door. What do you do? Stay If you stay with your original choice, then the chance you win the car is 1/3. Why? The car is equally likely to be in any door and the hosts decision to open a door containing a monkey does NOT affect your

original choice. Swap If you decide to swap to the third door, then there are two possibilities. Firstly, if the door you have chosen contains the car (probability 1/3), then you will switch to the door with a monkey. Secondly, if the door you have chosen contains a monkey (probability 2/3), then the host will open the other door with the monkey, meaning you will switch to the door with a car. In total, there is a 2/3 chance you will

switch to the door with the car! Swapping DOUBLES the chance that you will win the car! Strange Dice Your friend challenges you to a game: There are three dice, A, B and C. They are numbered as shown, with equal numbers on opposite faces. Each player selects one die and rolls it. The player who rolls the lower number must give the other player $1. Your friend lets you select first. Which die should you choose?

2+6+7=15 1+5+9=15 3+4+8=15 Die A Suppose you choose die A. Your friend decides to choose die C. There are 9 possible pairs (A, C) all equally likely. (2, 3), (2, 4), (2, 8), (6, 3), (6, 4), (6, 8), (7, 3), (7, 4), (7, 8) In total, your friend wins 5/9 of the time, while you win 4/9 of the time. Die C Suppose you choose die C. Your friend decides to choose die B.

There are 9 possible pairs (C, B) all equally likely. (3, 1), (3, 5), (3, 9), (4, 1), (4, 5), (4, 9), (8, 1), (8, 5), (8, 9) In total, your friend wins 5/9 of the time, while you win 4/9 of the time. Die B Suppose you choose die B. Your friend decides to choose die A. There are 9 possible pairs (B, A) all equally likely. (1, 2), (1, 6), (1, 7), (5, 2), (5, 6), (5, 7), (9, 2), (9, 6), (9, 7) In total, your friend wins 5/9 of the time, while you win 4/9 of the time. Whats going on? You are now very confused and have lost a lot of money to your friend. C beats A with probability 5/9.

B beats C with probability 5/9. A beats B with probability 5/9. This shows that beats is not transitive. (example: rock paper scissors) Picking first is actually a disadvantage. A variant To make you happy, your friend suggests another game. This time, you will roll your die TWICE, and your score is the sum of your two rolls. He even lets you pick second this time. What will you do? Die B

Suppose your FRIEND choose die B. You know die A beats B with probability 5/9, so you choose die A. Rolling die B twice generates the nine pairs: (1, 1), (1, 5), (1, 9), (5, 1), (5, 5), (5, 9), (9, 1), (9, 5), (9, 9) The 9 sums are (2, 6, 6, 10, 10, 10, 14, 14, 18) Rolling die A twice generates the nine pairs: (2, 2), (2, 6), (2, 7), (6, 2), (6, 6), (6, 7), (7, 2), (7, 6), (7, 7) The 9 sums are (4, 8, 8, 9, 9, 12, 13, 13, 14) Die B Suppose your FRIEND choose die B. The 9 sums of B are (2, 6, 6, 10, 10, 10, 14, 14, 18) The 9 sums of A are (4, 8, 8, 9, 9, 12, 13, 13, 14)

The A-sum will be bigger than the B-sum in 1+3+3+3+3+6+6+6+6 = 37 cases out of 81 total (9x9 for each sum (A, B)) Therefore A will beat B with probability only 37/81 and you lose money to your friend. (Note: In this example theres a 2/81 chance of a tie) Whats going on? With one roll, we have: C > A > B > C. But with two rolls, we have: C < A < B < C. In general, there exist arbitrarily large sets of dice which will beat each other in any desired pattern according to how many times the dice are rolled.

The Birthday Problem Whats the probability of sharing a birthday among n students in a class? Let d = 365 be the number of days in a year. The probability that no two people will share the same birthday is With 27 people, this probability is approximately 1/e, meaning theres a 63% chance two people will share a birthday We have used the fact that 1 x < e-x for all x > 0

2 Averages and Expectation How to Lie with Probability A math professor decided to implement a new grading scheme. 80% If over of the class scores at least 1.25 times the class average, then EVERYONE gets an A for the course.

HANG ON If 80% of the class scores at least 1.25 times the average, then the average score must be at least 0.8 * 1.25 * average which is exactly the average score In other words, the 80% of students will score a total score which is greater than the whole 100% of students Markovs Theorem This bound is

incredibly weak, but it has some interesting applications. Consequences If the average IQ is 100, then less than a third of the population can have an IQ of 300 or above. The probability that someone will have an IQ of above 200 is less than 100/200 = . Rolling 4 or more on a fair die has probability less than 3.5/4 = 7/8.

A question about Averages The batting average of a baseball player is the number of hits over the number of bats. Player As average is higher than that of Player B for the FIRST season AND the SECOND season. Can Player B still have a higher batting average overall? Note: the number of bats could be different for each player in any season. Think carefully about this problem! A question about Averages Season 1 Season 2

Total Player 1 40/100 = 0.4 30/900 = 0.033 70/1000 = 0.07 Player 2 3/10 = 0.3

1/40 = 0.025 4/50 = 0.08 Here in Season 1, we have 0.4 > 0.3, so Player 1 has a higher average. In Season 2, we have 30/900 = 0.033 > 1/40 = 0.025, so Player 1 has a higher average. In total, Player 1 has an average of 70/1000 = 0.07. Player 2 has an average of 4/50 = 0.08. It is POSSIBLE for Player 2 to have a higher overall average. Simpsons Paradox Several students applied for Math and Computer Science graduate programs for a

prestigious university Math CS Overall Male 70/100 Acceptance 70% 2/5 40%

72/105 69% Female 4/5 Acceptance 80% 50/100 50% 54/105 51% The university decided to release admission

statistics to avoid gender discrimination It says that the acceptance rate for BOTH the Math and Computer Science programs were higher for women However, much more men were accepted than women overall Can you trust the admission stats when applying to universities? Disease Detection A rare lethal disease has emerged but a new diagnostic test has been

invented. If you have the disease, the test is guaranteed to detect it. If you dont have the disease, the test will report that correctly 99% of the time. The disease affects 0.01% of the population. You take the test and it comes back positive. What is the chance you have the disease? Disease Detection There are two scenarios. Scenario 1: You have the disease and therefore test positive. (Chances are 0.0001) Scenario 2: You dont have the disease and test a false positive. (Chances are 0.9999 x 0.01 = 0.009999)

Given that you have tested positive, the chance you actually have the disease is 0.0001 / (0.0001 + 0.009999) = 0.0099 The chance you actually have the disease is less than one percent. A simple game You play a game with two of your friends Each person pays $2 to the pot Each person writes down either Heads or Tails on a piece of paper A fair coin is flipped The $6 is divided between people who guess correctly If NO ONE guesses correctly, then $2 is returned to each person Fair or Not Fair? Eight possible scenarios:

ME Since the coin is FAIR, there is a 1/2 chance of each person guessing correctly, and each scenario is equally likely with probability 1/8 Friend 1 Friend

2 Money obtained CORRECT $6 3 = $2 0 WRONG $6 2 = $3 1

CORRECT $6 2 = $3 Friend WRONG 2 1 $6 1 = $6 4

CORRECT nothing -2 WRONG nothing -2 CORRECT nothing

-2 WRONG $2 (all wrong) Gain CORRECT CORRECT ME Friend

WRONG 1 CORRECT WRONG WRONG 0 A simple game Expected Earnings 1/8 * ( 0 + 1 + 1 + 4 2 2 2 + 0 ) = $0 The game seems fair.

HOWEVER After playing ten games with your friends, you have already lost $5 After twenty games, you have lost $10 Coincidence? Or is something more sinister happening The impact of Collusion What are the odds that you would lose $10 after twenty games? How unlucky would you have to be?

After a while, you realize that your friends have made OPPOSITE guesses each time. How can this impact your earnings? Fair or Not Fair? Eight possible scenarios: ME Since the coin is FAIR, there is a 1/2 chance of each person guessing correctly, and each scenario is equally

likely with probability 1/8 Friend 1 Friend 2 Money obtained Gain CORRECT

$6 3 = $2 0 WRONG $6 2 = $3 1 CORRECT $6 2 = $3 Friend

WRONG 2 $6 1 = $6 CORRECT nothing WRONG nothing CORRECT

nothing CORRECT CORRECT ME Friend WRONG 1 CORRECT WRONG WRONG

WRONG 1 4 -2 -2 -2 $2 (all wrong) 0 Fair or Not Fair? FOUR possible scenarios: ME

Friend 1 Friend 2 Money obtained Gain WRONG $6 2 = $3 1

CORRECT $6 2 = $3 CORRECT CORRECT Each scenario is equally likely with probability 1/4 ME Friend

WRONG 1 Friend 2 1 CORRECT WRONG WRONG WRONG nothing

CORRECT nothing -2 -2 A simple game Expected Earnings 1/4 * ( 1 + 1 2 2 ) = -$0.5 You expect to lose half a dollar each turn. Real Life Impacts

Collusion can be used in many real life instances where a large pot is split between people who pick correctly For example, many people choose lottery numbers based on dates (a small set of numbers was selected by a large sample of the population) An MIT Professor (Herman Chernoff) managed to earn an expected value of +$0.07 per dollar by selecting numbers uniformly at random 3

Random Walks How to Lie with Probability The life of an ant An ant moves either left or right one inch with equal probability each second. He is currently one inch right of the Cliff of Doom. If he lands on the edge of the cliff, he falls off and dies. Does the ant necessarily have to die? If so, how many seconds is his

expected lifespan? Such a movement is known as a random walk. A simpler problem Suppose a landslide occurs and now there is a Cliff of Disaster two inches to the right. The tree diagram to the left shows his possible fates. From the diagram, we can see that he falls off the Cliff of Doom on the left with probability And off the Cliff of Disaster with

probability And the probability he hops back and forth in the middle? ZERO. The general problem Suppose that the ant is standing in position n, with cliffs n steps to the left and w-n steps to the right. There are three scenarios: fall off to left, fall off to right, stay forever. Let Rn be the probability that the ant falls off to the right given that he starts in position n. We know that Rw = 1 (already fell off) and R0 = 0 (fell off other side).

The general problem Suppose that the ant is standing in position n, not equal to w or 0. There are two scenarios: either the ant can move one step to the left, or one step to the right. If he steps left, he is in position n-1 and has a Rn-1 probability of falling off the right. If he steps right, he is in position n+1 and has a Rn+1 probability of falling off the right. This means that Rn = Rn-1 + Rn+1 A recurrence relation So far, we know

This is a linear recurrence! We can rearrange the equation to obtain Rn+1 = 2Rn Rn-1 The characteristic equation is with a double root at x = 1. The general solution has form and setting boundary conditions for n = 0 and n = w gives us a = 0, b = 1/w. This means that Rn =n/w The solution is linear, and there is only one line that passes

through the two endpoints. The general problem What about falling off the left? We can exploit symmetry to solve this. Falling off the left side in this example is the same as starting in position w-n and falling off the right. (imaging flipping the whole picture) This means that Ln = Rw-n = (w-n) / w = 1 n/w The original problem On a finite island, we have Ln + Rn = 1. This tells us that the probability of hopping around forever on the island is zero.

This means the ant eventually falls into the sea. What about our infinite plateau? We can let w approach infinity. Here we obtain that his chances of falling into the sea is: Expected lifetime (slightly harder) Whats the expected number of steps the ant will take before falling off a cliff? Let Xn be the expected number of steps (equal to lifetime in seconds) taken before falling off, given that he starts in position n. We know that Xw = X0 = 0 (fell off already). Expected lifetime

Suppose that the ant is standing in position n, not equal to w or 0. There are two scenarios: either the ant can move one step to the left, or one step to the right. If he steps left, he is in position n-1 and can expect to live for another X n-1 steps. If he steps right, he is in position n+1 and can expect to live for another X n+1 steps. This means that Xn = 1 + Xn-1 + Xn+1 He takes a step each time, explaining the extra +1.

A recurrence relation So far, we know We can rearrange the equation to obtain The characteristic equation is with a double root at x = 1. Although this looks like our previous recurrence, the solution is actually completely different. It turns out that the general solution has the form Substituting in the boundary conditions gives us a = 0 and b = w. This means that Xn = wn n2 = n(w-n)

The expected lifespan is equal to the product of the distances to the two cliffs. For those who want to know how to solve it Lets say the two sides of the cliff are represented as a and b. We have The extra 1 in our recurrence means the solution will no longer be a straight line. We can rearrange this to get The original problem

In this case, his expected lifespan is This means the ant is expected to live FOREVER, but it is certain to eventually fall off into the sea. Random Walks and Gambling A gambler has $n (positive integer) and will make $1 bets until hes either broke or has $n+100. Whats the probability of going home broke? What about earning $100? How long will it take? If the chances of winning and losing are both the same at 0.5, then this can be modelled as a random walk

between 0 and n+100 with unit steps. HOWEVER What if the chances of winning is not 0.5 but slightly lower, say 0.48 The expected earning is 0.48 * $1 + 0.52 * (-$1) = -$0.04 This means the gambler is expected to lose 4 cents with every bet Even though the gambler may experience some upward swings, the downward drift

dominates in the long run. This is known as a biased random walk Random Walks and Gambling For this example, we can model the situation as a biased random walk where the probability of going down a step is p (< 0.5) and the probability of going up a step is q = 1 p. We start with $n and gamble until we reach $0 or $w = n + m (Target)

This means that you are more likely to go down than up, which means your wealth will decrease on average. If Rn denotes the probability that our gambler will win, we have the recurrence when n is between 0 and w. Random Walks and Gambling If p = 0.5, we have a double root and obtain the previous solution for the unbiased walk. However, since the gambler is not equally likely to win or lose each bet, we have a completely different solution.

Random Walks and Gambling It turns out, that instead of getting a linear solution, we have an exponential solution. Solving the system of equations gives us Substituting this back into our original question gives us Interpreting the solution If p < 0.5, then 1 p > p (The probability of losing is greater than the probability of winning) This means that the fraction is greater than one.

If n, the starting wealth is reasonably large, then the 1s dont make much of a difference, and the expression becomes very close to (remember than w > n) For example, in a game of roulette (where p = 9 / 19), the probability of winning $100 is very close to Note that this number did not depend on n, the starting amount of wealth! What does this mean? This means, that if you want to gain $100 from gambling, then the probability of doing so is the same starting with $100, $1000 or $1000000000! For example, As long as the starting wealth is not too small, the amount of money the gambler starts with makes no difference!

Exercise: find out why only one digit changes! If n is not too small, we can make this approximation How does this happen? Firstly, there are random upward and downward swings due to luck. Secondly, there is a steady downward drift due to the small expected loss with each bet. Even starting with a billion dollars does not help! If the gambler does not start off lucky, he is doomed forever!

CHAIR Pass the parcel A group of n+1 people sit in a circle. One of the people is elected the chairman. The chairman holds a parcel, which he passes to the person either on his left, or right with equal probability. The same happens for that person and after a while everyone who has touched that parcel forms an arc.

An interesting application Eventually the arc grows until everyone EXCEPT one person has touched it. That person is the winner. Suppose that the parcel is RADIOACTIVE and that everyone who touches the parcel will die.

If you want to survive, where should you sit? Where to Sit? CHAIR N Lets label the chairs from 1 to N (the chairman has chair 0) Since the chairman initially holds the parcel, 1 N-1

2 N-2 3 you decide to sit as far away from him as possible. (in seat N/2) Will this give you the highest chance of surviving? 4

5 Where to Sit? CHAIR N Suppose you sit in seat K. 1 N-1 2 At some point, the parcel will end up in the

hands of one of your neighbors (K-1 or K+1) N-2 3 Without loss of generality assume it is K-1. Now lets cut the circle between you and your OTHER neighbor (K+1). 4 5

Does this situation LOOK FAMILIAR? The life of an ant If the parcel reaches k to the left, you die. If the parcel reaches k+1 to the right, you survive. This is EXACTLY a random walk (of the parcel) with the Cliff of Doom one step to the left and the Cliff of Disaster (in this case a good thing!) n-1 steps to the

right. This indicates something quite surprising. The probability that it reaches the right before reaching the left is 1/n. This means your chances of survival are 1/n regardless of where you sit. (Just dont get elected chairman!) ? Whats next? Theres More! How to Lie with Probability

If youve liked this, you may want to look at Random walks in two (and three) dimensions Random walks on graphs Different types of recurrences and how to solve them Take a course in probability theory Gamblers Ruin Chebyshevs Inequality and Chernoff Bounds A thought experiment Everyone write down a positive integer The person who has the smallest UNIQUE number wins! Thank You for

Hope you learnt something! Watching!