# Towards Minimizing the Annotation Cost of Certified Text

Towards Minimizing the Annotation Cost of Certified Text Classification Mossaab Bagdouri 1 David D. Lewis 2 William Webber 1 Douglas W. Oard 1 1 University of Maryland, College Par k, MD, US A 2 David D. Lewis cons ulting, Chicago, IL, US A Outline Introduction Economical assured effectiveness

Solution framework Baseline solutions Conclusion 2 Goal: Economical assured effectiveness 1. Build a good classifier + ? - 2. Certify that this classifier is good 3. Use nearly minimal total annotations

(Photo courtesy of www.stockmonkeys.com) 3 Notation F1 = 0.05 ^ F1 F1 Training

Annotations Test 4 Fixed test set Growing training set F1 ^ F1 F1 Training Annotations

Test 5 Fixed test set Growing training set Collection = RCV1, Topic = M132, Freq = 3.33% Stop Criterion Success Desired F1 ^ 95.00%

46.42% 91.87% Training Training documents Test 6 Fixed training set Growing test set F1 ^ F1 F1

Test Annotations Training 7 Problem 1: Sequential testing bias F1 Stop here Want to stop here

F1 Do not stop Annotations 8 Solution: Train sequentially, Test once F1 Test only once

Train without testing Test Training annotations Training 9 Problem 2: What is the size of the Test set?

Test Training 10 Solution: Power analysis Observation 1 from power analysis: True effectiveness greatly exceeds the target Small test set needed Observation 2 from the shape of learning curves: New training examples provide less of an increase in effectiveness = 0.07 Power = 1 - F1 Training documents 11

Designing annotation minimization policies + Training + Test (\$\$\$) + True F1 Training Test Training

12 Allocation policies in practice No closed form solution to go from an effect size on F 1 to a test set size Simulation methods True effectiveness invisible Cross-validation to estimate it Training + Test (\$\$\$) Need to decide online True F1

Training Training + Test (\$\$\$) No access to the entire curve Scattered and noisy estimates Topic = C18, Frequency = 6.57% Training documents 13 Estimating the true F1 (Cross-validation) TP FP

TP FP TP FP FN TN FN TN FN

TN TP FP FN TN Training 14 Estimating the true F1 (Simulations) TP FP FN TN Posterior distribution TP FP FN TN Training

15 Minimizing the annotations Measure Algorithm (F1) (SVM) Infer test set size

+ F1 Training Test Training annotations 16 Experiments Test collection: RCV1-v2

29 topics with a prevalence 3% 20 randomized runs per topic Classifier: SVMPerf Off-the-shelf classifier Optimizes training for F1 Settings Budget: 10,000 documents Power 1 - = 0.93 Confidence level 1 = 0.95 Documents added in buckets of 20

17 Training + Test (\$\$\$) Policies Topic = C18 Frequency = 6.57% Training documents 18 Stop as early as possible Budget achieved in 70.52% of times Topic = C18, Frequency = 6.57% Sequential testing bias pushed

into process management Training + Test (\$\$\$) Failure rate of 20.54% > (7%) Training documents 19 Oracle policies Minimum cost policy Savings: 43.21% of the total annotations Failure rate of 27.14% > (7%) Topic = C18, Frequency = 6.57% Savings: 38.08%

Training + Test (\$\$\$) Minimum cost for success policy Training documents 20 Topic = C18, Frequency = 6.57% w Training + Test (\$\$\$) Cannot open (%) Success (%) Savings (%)

Wait-a-while policies W=1 W=0 Last chance W=3 W=2 Training documents 21 Conclusion Re-testing introduces statistical bias Algorithm to indicate: If / when a classifier can achieve a threshold

How many documents required to certify a trained model Subroutine for policies minimizing the cost Possibility to save 38% of cost 22 Towards Minimizing the Annotation Cost of Certified Text Classification Thank you!

