Paired Sampling in Density-Sensitive Active Learning

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University Outline Problem setting Motivation Our approach Experiments Conclusion Setting

X: feature space, label set Y={-1,+1} Data D ~ X x Y D = T U U T T: training set U: unlabeled set is small initially, U is large Active

Learning: Choose most informative samples to label Goal: high performance with least number of labeling requests Motivation Optimize Sampling disproportionately on one side may not be optimal Maximize likelihood of straddling the boundary with

paired samples Three the decision boundary placement factors affect sampling Local density Conditional entropy maximization Utility score Illustrative Example

Paired sampling Left Figure Single point sampling significant shift in the current hypothesis large reduction in version space

Right Figure small shift in the current hypothesis small reduction in version space Density-Sensitive Distance Cluster Hypothesis: decision boundary should NOT cut clusters

squeeze distances in high density regions increase distances in low density regions Solution: Density-Sensitive Distance find the weakest link along each path in a graph G a better way to avoid outliers (i.e. a very short edge in a long path) Chapelle & Zien (2005)

Density-Sensitive Distance Apply MDS (Multi-dimensional Scaling) to to obtain a Euclidean embedding Find eigenvalues and eigenvectors of Pick the first p eigenvectors s.t. Active Sampling Procedure

Given a training set T in MDS space 1. Train logistic regression classifier on T 2. For all Compute the pairwise score 3. Choose the pair with the maximum score

4. Repeat 1-3 Details of the Scoring Function S Two components of S Likelihood of a pair having opposite labels (straddling the decision boundary) Utility of the pair 1. 2.

By cluster assumption decision boundary should not clusters => points in different clusters are likely to have different labels In the transformed space, points in different clusters have low similarity (large distance) Thus, we can estimate An Analysis Justifying our Claim

Pairwise distances are divided into bins Pairs are assigned to bins acc. to their distances For each bin, relative frequency of pairs with opposite class labels are computed This graph (empirically) shows that likelihood of having opposite labels for two points monotonically increases with the pairwise distance between them. * This graph is plotted on g50c dataset. Utility Function

Two components Local density depends on number of close neighbors their proximity Conditional For

Entropy binary problems Uncertainty-Weighed Density captures the density of a given point information content of its neighbors novelty:

each neighbors contribution weighed by its uncertainty reduces the effect of highly certain neighbors dense points with highly uncertain neighbors become important Utility Function utility of a pair is regularize information content (entropy) of the pair

proximity-weighted information content of neighbors Experimental Data pair Six with maximum score selected binary datasets Experiment Setting For

each data set start with 2 labeled data points (1 +, 1 -) run each method for 20 iterations results averaged over 10 runs Baselines Uncertainty Sampling Density-only Sampling Representative Sampling (Xu et. al. 2003)

Random Sampling Results Results Conclusion Our contributions: combine uncertainty, density, and dissimilarity across decision boundary

proximity-weighted conditional entropy selection is effective for active learning Results show our method significantly outperforms baselines in error reduction fewer labeling requests than others to achieve the same performance Thank You!

Recently Viewed Presentations

• Typical reports requested include "Top 5 People generating Exceptions", or a Pareto Chart of Departments generating Exceptions, etc. The ability to perform meta analysis of Exceptions history can be very valuable in identifying broken processes or quality issues with specific...
• I think they are repeatable accomplishments that are short in duration. (Controversial) Nominal and verbal aspect When the verb is such that it works on the object in an incremental fashion, there is a predictable relationship between the types of...
• obtain its IP address from network server when it joins network. can renew its lease on address in use. allows reuse of addresses (only hold address while connected/"on") support for mobile users who want to join network (more shortly) DHCP...
• 2015 Strategic Initiatives. Integrate Evolving Competencies to strengthen AHIA member resources and develop Communities of Practice. Implement an interactive, O. nline Collaboration . platform within the website to promote . Subject Matter Leadership
• Slides are adapted from the book's companion Web site, with changes by Anirban Mahanti and Carey Williamson. ...
• Through CISCP, DHS and participating companies share information about cyber threats, incidents, and vulnerabilities. Participants are better equipped to secure their own networks and analysts learn from each other about understand emerging cybersecurity risks and effective defenses.
• Find the Taylor polynomial {image} for the function f at the number a = 1. f(x) = ln(x) 1. {image} {image} {image} 2. 3. 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5...
• STABLE SHADOW PROGRAM HQDA DCS, G-2 Stable Shadow Program The purpose of the "Stable Shadow" program is to identify qualified civilians, contractors and former military personnel to hire and deploy as temporary DA civilians, In Lieu of (ILO) military personnel,...