Towards Minimizing the Annotation Cost of Certified Text
Towards Minimizing the Annotation Cost of Certified Text Classification Mossaab Bagdouri 1 David D. Lewis 2 William Webber 1 Douglas W. Oard 1 1 University of Maryland, College Par k, MD, US A 2 David D. Lewis cons ulting, Chicago, IL, US A Outline Introduction Economical assured effectiveness
Solution framework Baseline solutions Conclusion 2 Goal: Economical assured effectiveness 1. Build a good classifier + ? - 2. Certify that this classifier is good 3. Use nearly minimal total annotations
(Photo courtesy of www.stockmonkeys.com) 3 Notation F1 = 0.05 ^ F1 F1 Training
Annotations Test 4 Fixed test set Growing training set F1 ^ F1 F1 Training Annotations
Test 5 Fixed test set Growing training set Collection = RCV1, Topic = M132, Freq = 3.33% Stop Criterion Success Desired F1 ^ 95.00%
46.42% 91.87% Training Training documents Test 6 Fixed training set Growing test set F1 ^ F1 F1
Test Annotations Training 7 Problem 1: Sequential testing bias F1 Stop here Want to stop here
F1 Do not stop Annotations 8 Solution: Train sequentially, Test once F1 Test only once
Train without testing Test Training annotations Training 9 Problem 2: What is the size of the Test set?
Test Training 10 Solution: Power analysis Observation 1 from power analysis: True effectiveness greatly exceeds the target Small test set needed Observation 2 from the shape of learning curves: New training examples provide less of an increase in effectiveness = 0.07 Power = 1 - F1 Training documents 11
Designing annotation minimization policies + Training + Test ($$$) + True F1 Training Test Training
12 Allocation policies in practice No closed form solution to go from an effect size on F 1 to a test set size Simulation methods True effectiveness invisible Cross-validation to estimate it Training + Test ($$$) Need to decide online True F1
Training Training + Test ($$$) No access to the entire curve Scattered and noisy estimates Topic = C18, Frequency = 6.57% Training documents 13 Estimating the true F1 (Cross-validation) TP FP
TP FP TP FP FN TN FN TN FN
TN TP FP FN TN Training 14 Estimating the true F1 (Simulations) TP FP FN TN Posterior distribution TP FP FN TN Training
15 Minimizing the annotations Measure Algorithm (F1) (SVM) Infer test set size
+ F1 Training Test Training annotations 16 Experiments Test collection: RCV1-v2
29 topics with a prevalence 3% 20 randomized runs per topic Classifier: SVMPerf Off-the-shelf classifier Optimizes training for F1 Settings Budget: 10,000 documents Power 1 - = 0.93 Confidence level 1 = 0.95 Documents added in buckets of 20
17 Training + Test ($$$) Policies Topic = C18 Frequency = 6.57% Training documents 18 Stop as early as possible Budget achieved in 70.52% of times Topic = C18, Frequency = 6.57% Sequential testing bias pushed
into process management Training + Test ($$$) Failure rate of 20.54% > (7%) Training documents 19 Oracle policies Minimum cost policy Savings: 43.21% of the total annotations Failure rate of 27.14% > (7%) Topic = C18, Frequency = 6.57% Savings: 38.08%
Training + Test ($$$) Minimum cost for success policy Training documents 20 Topic = C18, Frequency = 6.57% w Training + Test ($$$) Cannot open (%) Success (%) Savings (%)
Wait-a-while policies W=1 W=0 Last chance W=3 W=2 Training documents 21 Conclusion Re-testing introduces statistical bias Algorithm to indicate: If / when a classifier can achieve a threshold
How many documents required to certify a trained model Subroutine for policies minimizing the cost Possibility to save 38% of cost 22 Towards Minimizing the Annotation Cost of Certified Text Classification Thank you!
Mark Densen. About Mark. A lifelong citizen of the Albuquerque Metro area. Attended high school in Los Lunas. Highly motivated to work. A dependable, conscientious, meticulous worker. Personable and friendly. Creative and artistic. A pleasant, gentle man.
Other important operant conditioning concepts Shaping: reinforcing successive approximations of the desired behavior until the complete response is well established Generalization: displaying the response to stimulus situations that resemble the one in which the original response was acquired Discrimination: selectively...
Name your town 30+ minutes to create it Present to class. GREENHOUSE EFFECT Growing since the industrial revolution. As the world's use of energy grows, CO2 emissions grow even higher. Compounding the effect, 50 percent of today's CO2 emissions stay...
* Gjennom mer uspesifikke faktorer En god behandlingsallianse et potent middel til å påvirke forløpet av somatiske og psykiske lidelse * OUTCOME Global bedring Tilstrekkelig symptombedring Symptomslvorlighetskår Livskvalitet UTNYTTE EN GOD KONSULTASJON FOR DET DENER VERDT * Disse oversiktsartiklene hat...
Ready to download the document? Go ahead and hit continue!