When and how can social scientists add value to data scientists? Moshe Tennenholtz Technion Eyal Ert Hebrew U Ido Erev Technion & WBS Reut Apel Technion Ori Plonsky Duke U The current research tries to facilitate synergy between research in behavioral economics and machine learning. The first part highlights a difficulty (the classical behavioral studies cannot be easily used to derive effective features for predictions), and presents a choice prediction competition (CPC15) designed to address this problem. The second part challenges you to participate in a new choice prediction competition (CPC18) designed to clarify the results of the first competition.

The submission deadline is May 8, 2018. Part 1: The difficulty and the 2015 competition (CPC15, Erev et al., 2017) Classical research in behavioral economics focuses on clarifying deviations from rational choice. The phenomena highlighted by this research include: The St. Petersburg paradox (Bernoulli, 1734) Buying both lotteries and insurance (Friedman and Savage, 1948) The Allais paradox (Allais, 1953) Satisfycing (Simon, 1955) The Ellsberg paradox (Ellsberg, 1961) Loss aversion (Kahneman & Tversky, 1979) The break even effect (Thaler & Johnson, 1990) Underweighting of rare events (Barron & Erev, 2013) Each of these phenomena was captured with an elegant model. However, the relationship between the different models is not clear. Al Roth clarified this problem with the 1-800 critique. Our initial effort to facilitate the development of general behavioral models involves three main steps: 1. A clarification of the relationship between the different choice phenomena (anomalies) by replicating them in a single experimental paradigm. Our results show that the classical description anomalies are replicable, but experience reverses or eliminates most of them. 2. A demonstration that it is possible to reproduce the classical anomalies with a single model.

3. An organization of a choice prediction competition in which we challenged other researchers to develop a better model. The models were compared based on their prediction of a new set of 60 randomly selected choice problems. Our analysis focused on an 11 dimensions space of choice task. Each problem in our space is a choice between two basic prospects: Option A: HA with probability pHA; or LA otherwise. Option B: Up to 10 outcomes, defined by 5 parameters A 9th parameter, Corr, captures the correlation between the prospects The 10th parameter, Amb, captures ambiguity The 11th parameter FB captures the DMs feedback. This parameter was studied within problem. The DMs faced each problem for 25 trials, and got feedback after each choice from the 6th trial. Example of a basic experimental task, trial 1, initial screen Please select one of the following options: B: 4 with p = 0.8 0 with p = 0.2 A: 3 with certainty

B A Example of a basic experimental task, trial 1, limited feedback B: 4 with p = 0.8 0 with p = 0.2 A: 3 with certainty B A You selected B Example of a basic experimental task, trial 6, initial screen Please select one of the following options: B: 4 with p = 0.8 0 with p = 0.2

A: 3 with certainty B A Example of a basic experimental task, trial 6, feedback screen B: 4 with p = 0.8 0 with p = 0.2 A: 3 with certainty B A 4 You selected B, your payoff is 4 Had you selected A your payoff would be 3

3 Example of an ambiguous task, trial 1, initial screen Please select one of the following options: B: 10 with p = q1 0 with p = q2 A: 10 with p =0.5 0 with p = 0.5 B A Two reasons for the focus on choice among gambles: 1. The best studied topic in behavioral economics. 2. The study of gambles captures human reaction to incentives, and this reaction is one of the challenges for machine learning tools. For example, designers of autonomous vehicles should worry about the possibility that after the development of cars that stops when pedestrian step on the road, pedestrian will learn to use this behavior more often.

The Allais (common ratio) paradox/certainty effect (Allais, 1953, K&T, 1979) 5, with FB 1, No FB (58%) Problem 1 3 with certainty 42% 4 with p= .8, 0 otherwise Problem 2 (39%) 61% Certainty effect from description. 3 with p =.25, 0 otherwise 4 with p= .2, 0 otherwise Block A

B Block A B The Allais (common ratio) paradox/certainty effect (Allais, 1953, K&T, 1979) 5, with FB 1, No FB (58%) Problem 1 3 with certainty Block A 65% 42% 4 with p= .8, 0 otherwise B

Problem 2 62% (39%) 61% Block 3 with p =.25, 0 otherwise 4 with p= .2, 0 otherwise A B P(B) 1.00 Certainty effect from description. The addition of feedback increases maximization and eliminates the paradox 0.75 0.50 0.25

0.00 1 2 3 Block 4 5 The reflection effect (kahneman & Tversky, 1979; Simon, 1955) 5 (with FB) 1 (No FB) 42% 3 with certainty 4 with p= .8, 0 otherwise Block A B 49%

-3 with certainty -4 with p= .8, 0 otherwise A B The reflection effect (kahneman & Tversky, 1979; Simon, 1955) 5 (with FB) 1 (No FB) 65% 36% 42% 3 with certainty 4 with p= .8, 0 otherwise Block A B 49% -3 with certainty

-4 with p= .8, 0 otherwise A B Experimental Risk aversion in the gain and weak risk seeking in the loss domain, feedback eliminates this pattern and increase maximization P(B) 1.00 0.75 0.50 0.25 0.00 1 2 3

Block 4 5 Insurance, lotteries, Over and under-weighting of rare events (Savage & Friedman, 1948; Kahneman & Tversky, 1979; Barron & Erev, 2003) 5 (with FB) 1 (No FB) Block 2 with certainty A Buy lottery Buy insurance 55% 101 with p= .01, 1 otherwise B (52%) 48% -1 with certainty

-20 with p= .05, 0 otherwise A B Some overweighting of rare events before feedback Insurance, lotteries, Over and under-weighting of rare events (Savage & Friedman, 1948; Kahneman & Tversky, 1979; Barron & Erev, 2003) 5 (with FB) 1 (No FB) Block 2 with certainty A 55% 101 with p= .01, 1 otherwise B Buy lottery 42% Buy insurance 63% (52%) 48% -1 with certainty

-20 with p= .05, 0 otherwise A B P(B) 1.00 Some overweighting of rare events before feedback, and robust underweighting with feedback 0.75 0.50 0.25 0.00 1 2 3 Block

4 5 Loss aversion 5 (with FB) 1 (No FB) Block 0 with certainty A 34% +50 with p = 0.5; -50 otherwise (EV = 0) B 35% 13 with certainty 50 with p= .6; -45 otherwise (EV = 12) A

B Loss aversion 5 (with FB) 1 (No FB) 38% 50% Block 0 with certainty A 34% +50 with p = 0.5; -50 otherwise (EV = 0) B 35% 13 with certainty 50 with p= .6; -45 otherwise (EV = 12) A B

1 P(B) 0.5 01 2 3 4 5 The St. Petersburg paradox (after Bernoulli, 1738) 5 (with FB) 1 (No FB) Block 9 with certainty 38% 2 with p=.5

4 with p=.25 8 with p=.125 16 with p=.0625 32 with p=.03125 64 with p=.0015625 128 with p=.00078125 1.00 256 with p=.00078125 P(B) 36% A B 0.75 0.50 Risk aversion. 0.25 0.00 1

2 3 Block 4 5 The Ellsberg paradox (Ellsberg, 1961) 51% 1, No FB 37% Initial ambiguity aversion, and experience eliminates this effect Block 10 with p= 0.5; 0 otherwise 10 with p = q1; 0 otherwise (q1 = 0.5)

A B 1.00 P(B) 5, with FB 0.75 0.50 0.25 0.00 1 2 3 4 5 Regret (Loomes & Sugden, 1982) and correlation (Diederich & Busemeyer, 1999) effects. P(E) = 0.5

5 (with FB) 1 (No FB) 99% Block A B 97% 6 if E, 0 otherwise 8 if E, 0 otherwise 96% 6 if E, 0 otherwise 9 if not-E, 0 otherwise 85% A B Weak sensitivity for regret/correlation without feedback,

feedback increases the regret correlation/effect. P(B) 1.00 0.75 0.50 0.25 0.00 1 2 3 Block 4 5 Summary of Study 1 Our results show that the classical description anomalies are replicable, but experience reverses or eliminates all of them, but risk aversion in the St.

Petersburg problem. Experience, in all cases, decreased the impact or rare events (only in St. Petersburg problem the original deviation from maximization reflect underweighting of rare event). Similar results were observed in Study 2 (that examined 60 randomly selected problems) BEAST (Best Estimate And Sampling Tools) the best model that we could find assumes very different processes than assumed by the leading models. It does not assume subjective weighting of subjective values (like prospect theory), and does not assume cognitive shortcuts (like the priority Heuristic) Rather it assumes the approximation of the EV plus four extra processes that involve sampling from memory using the following tools: Pessimism (sample the worst outcome) Uniform (all outcomes are equally likely) Sign (implies high sensitivity to the payoff sign). Unbiased (implies minimizing probability of regret) Feedback increases the probability of the unbiased sampling, but the

samples stay small. Reliance on small samples implies a bias toward the option that minimizes the probability of regret, and underweighting of rare events The CPC15 competition http://departments.agri.huji.ac.il/economics/teachers/ert_eyal/competition.htm On December 2014 we posted the results of Study 1 and 2 (90 choice problems) on the web, and challenged decision scientists to participate in a competition to predict the results of study 3. We offered BEAST and challenged the participants to offer BEAUTY Study 3 was run on April, 2015. The submission deadline was May 17 2015. Competition participants 53 registered teams 25 submissions from 5 continents Three classes of submissions: 4 Subjective functions models (CPT-like) 15 BEAST-like (EV plus sampling tools) 6 Machine learning models 0.9 0.8 Observed B-Rate

Main results: All 12 top models were variants of BEAST. The difference between these models were statistically insignificant. The winner is Cohens BEAST. With-Feedback Blocks 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 BEAST Prediction The best machine learning model, submitted By Noti, Levi, Kolumbus, & Daniely, was almost as good as the BEASTs 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The new competitions: CPC18 (Plonsky, Apel, Erev, Ert, & Tennenholtz, in prep) Three explanations of the fact that the machine learning (ML) submissions did not win CPC15: 1.The ML tools were designed to predict future behavior in a familiar setting (like

the Netflix recommending system) . They do not have advantage over traditional models in predicting behavior in new environments. 2. The ML submissions used ineffective features. With more careful features selection, they could have won the competition (Plonsky, Erev, Hazan & Tennenholtz, AAAI17). 3. The ML submissions used suboptimal tools. In order to compare these hypotheses, we invite you to participate in two related competitions. The first, CPC18a, will be extension of CPC15: The same goal (predicting the aggregate choice rate in new problem), with a larger space of choice task. In CPC18a both prospects will have up to 10 outcomes. The second. CPC18i will focus on prediction the behavior of specific individuals in 5 specific choice problems, after observing their behavior in 25 other problems. This task is similar to the best known application of ML (like the Netflix model), and we hope to learn if how behavioral features can help in this setting. We accept submissions in R, MATLAB, Python and SAS The description is on the web https://cpc18.wordpress.com/ It includes the best models that we could find as baselines. Participants are allowed to use these baselines are starting points. Both baselines for the CPC18a (predicting the aggregate) are variants of BEAST. BEAST.sd is a refinement of BEAST that assumes subjective detection of

dominance (the error rate decreases when the same option maximizes EV, and minimizes Probability of regret). Psych Forest is a random forest based model that uses BEAST as one of the features. The best baselines we found for CPC18i (predicting individual choice) are collaborative flittering models and use shallower psychology Summary Classical research in behavioral economics uses different elegant models to capture different anomalies. As a result, it is not easy to use this research to predict behavior. CPC15 shows that it is not difficult to develop more general behavioral models that allow useful predictions. Interestingly, the processes assumed by the winning models are very different than the processes assumed by the classical models. Rather than assuming the weighting of subjective values, or the use of simple heuristic, the winners assume high sensitivity to expected value and four sampling tools. Theory free ML tools provide much better predictions than the classical behavioral models, but they do not outperform the competition winners. We hope that our next competition will clarify the best way to combine behavioral insights with ML tools.