# Towards Minimizing the Annotation Cost of Certified Text

Towards Minimizing the Annotation Cost of Certified Text Classification Mossaab Bagdouri 1 David D. Lewis 2 William Webber 1 Douglas W. Oard 1 1 University of Maryland, College Par k, MD, US A 2 David D. Lewis cons ulting, Chicago, IL, US A Outline Introduction Economical assured effectiveness

Solution framework Baseline solutions Conclusion 2 Goal: Economical assured effectiveness 1. Build a good classifier + ? - 2. Certify that this classifier is good 3. Use nearly minimal total annotations

(Photo courtesy of www.stockmonkeys.com) 3 Notation F1 = 0.05 ^ F1 F1 Training

Annotations Test 4 Fixed test set Growing training set F1 ^ F1 F1 Training Annotations

Test 5 Fixed test set Growing training set Collection = RCV1, Topic = M132, Freq = 3.33% Stop Criterion Success Desired F1 ^ 95.00%

46.42% 91.87% Training Training documents Test 6 Fixed training set Growing test set F1 ^ F1 F1

Test Annotations Training 7 Problem 1: Sequential testing bias F1 Stop here Want to stop here

F1 Do not stop Annotations 8 Solution: Train sequentially, Test once F1 Test only once

Train without testing Test Training annotations Training 9 Problem 2: What is the size of the Test set?

Test Training 10 Solution: Power analysis Observation 1 from power analysis: True effectiveness greatly exceeds the target Small test set needed Observation 2 from the shape of learning curves: New training examples provide less of an increase in effectiveness = 0.07 Power = 1 - F1 Training documents 11

Designing annotation minimization policies + Training + Test (\$\$\$) + True F1 Training Test Training

12 Allocation policies in practice No closed form solution to go from an effect size on F 1 to a test set size Simulation methods True effectiveness invisible Cross-validation to estimate it Training + Test (\$\$\$) Need to decide online True F1

Training Training + Test (\$\$\$) No access to the entire curve Scattered and noisy estimates Topic = C18, Frequency = 6.57% Training documents 13 Estimating the true F1 (Cross-validation) TP FP

TP FP TP FP FN TN FN TN FN

TN TP FP FN TN Training 14 Estimating the true F1 (Simulations) TP FP FN TN Posterior distribution TP FP FN TN Training

15 Minimizing the annotations Measure Algorithm (F1) (SVM) Infer test set size

+ F1 Training Test Training annotations 16 Experiments Test collection: RCV1-v2

29 topics with a prevalence 3% 20 randomized runs per topic Classifier: SVMPerf Off-the-shelf classifier Optimizes training for F1 Settings Budget: 10,000 documents Power 1 - = 0.93 Confidence level 1 = 0.95 Documents added in buckets of 20

17 Training + Test (\$\$\$) Policies Topic = C18 Frequency = 6.57% Training documents 18 Stop as early as possible Budget achieved in 70.52% of times Topic = C18, Frequency = 6.57% Sequential testing bias pushed

into process management Training + Test (\$\$\$) Failure rate of 20.54% > (7%) Training documents 19 Oracle policies Minimum cost policy Savings: 43.21% of the total annotations Failure rate of 27.14% > (7%) Topic = C18, Frequency = 6.57% Savings: 38.08%

Training + Test (\$\$\$) Minimum cost for success policy Training documents 20 Topic = C18, Frequency = 6.57% w Training + Test (\$\$\$) Cannot open (%) Success (%) Savings (%)

Wait-a-while policies W=1 W=0 Last chance W=3 W=2 Training documents 21 Conclusion Re-testing introduces statistical bias Algorithm to indicate: If / when a classifier can achieve a threshold

How many documents required to certify a trained model Subroutine for policies minimizing the cost Possibility to save 38% of cost 22 Towards Minimizing the Annotation Cost of Certified Text Classification Thank you!

## Recently Viewed Presentations

• Mark Densen. About Mark. A lifelong citizen of the Albuquerque Metro area. Attended high school in Los Lunas. Highly motivated to work. A dependable, conscientious, meticulous worker. Personable and friendly. Creative and artistic. A pleasant, gentle man.
• Public Education in Maryland: Very Brief Overview of Structure Governance - State. Local School Systems . Total of 24 School Systems that follow geographic borders of Counties and Baltimore City
• Assess Redox Titration Lab. Review Thermochemistry Concepts. ... Redox Titration Lab. 5H 2 O 2 + 2KMnO 4 +3H 2 (SO 4) 2Mn(SO 4) +K 2 (SO 4) + 5 O 2 + 8H 2 O. Chem II: 10/15. Due:...
• Other important operant conditioning concepts Shaping: reinforcing successive approximations of the desired behavior until the complete response is well established Generalization: displaying the response to stimulus situations that resemble the one in which the original response was acquired Discrimination: selectively...
• Name your town 30+ minutes to create it Present to class. GREENHOUSE EFFECT Growing since the industrial revolution. As the world's use of energy grows, CO2 emissions grow even higher. Compounding the effect, 50 percent of today's CO2 emissions stay...
• Internal Certificate (Aid Code 00) No Limits. Advanced Technical Certificate (ATC) (Aid Code 01) 9-12. Embedded Pathway Certificate (EPC) (Aid Code 61) 2-18. Short Term Technical Diploma (Aid Code 30) 2-25 "One-Year" Technical Diploma (Aid Code 31) 26-54 "Two-Year" Technical...
• Diversity of Life
• * Gjennom mer uspesifikke faktorer En god behandlingsallianse et potent middel til å påvirke forløpet av somatiske og psykiske lidelse * OUTCOME Global bedring Tilstrekkelig symptombedring Symptomslvorlighetskår Livskvalitet UTNYTTE EN GOD KONSULTASJON FOR DET DENER VERDT * Disse oversiktsartiklene hat...