I2B2 Shared Task 2011 Coreference Resolution in Clinical

I2B2 Shared Task 2011 Coreference Resolution in Clinical

I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez What is coreference resolution? Nouns, pronouns, and phrases that refer to the same object, person or idea are coreferent. o Example: "Alexander was playing soccer yesterday. He

fell and broke his knee." o "Alexander", "he", and "his" refer to the same person, so they are said to be coreferent. The i2b2 Challenge I2B2: "Informatics for Integrating Biology and the Bedside" This program has issued a challenge in NLP involving Coreference resolution. The challenge is to find co-referential relations within a given medical document.

The concepts that can be corefered are all annotated. There are 5 classes of concepts: o Problem o Person o Treatment o Test o Pronoun Concept Mentions People

Any mention that refers to a person, or group of people o Dr. Lightman, The patient, cardiology Problems A mention that refers to the reason the subject of the document is in the hospital o Heart attack, blood pressure, broken leg Tests Tests performed by doctors o EKG, temperature, CAT scan Treatments

Solutions to the problem mentions, or work performed to cure patients o Brain surgery, ice pack, Tylenol Pronouns can refer to any of the four other types of mentions Approaches for Competing Using tools already made & publicly available o Stanford NLP o BART Coreference o LingPipe

o CherryPicker o Reconcile o ARKRef o Apache Open NLP Coding our own Coreference Tool Other Coreference Tools We obtained versions of other Coreference tools and tested them on our data. All tools we found were either still in their initial development

stages, or were built for their specific purpose and left alone after. (i.e. Coreference on the MUC datasets) Testing shows that at best, the other tools we found do not perform acceptably with our data. After attempts to train other tools using our data failed, we felt it best to code our own approach. Other Tools Statistics Algorithm

Because the data we are working on is so specific, we chose to use a rule based approach to coreference resolution. This means that we try to learn the characteristics of each coreferent link ourselves, and program a method for the link manually. We examine concepts in a file, and if they meet our criteria, we create a method to link them. The idea is to create specific rules, yet generalized enough to apply to similar mentions in all documents.

Our Application To help visualize coreferent links and see what links our program detects, we use a GUI created with Java. Our program is developed by us using the Mecurial version control system to allow us to keep each others code up to date. Uses our coded algorithms to determine coreferent links between the given concept mentions. It displays coreferent links as lines.

o Blue for true links. o Red for links that are detected by our algorithm. Our Application Programmed in Java, our application can utilize databases, and the internet to gather information about concept mentions being tested. We have set up a database to hold data that gives meaning to concept mentions being tested, or to certain key words in a sentence that contains a mention. If words or phrases

meet our criteria, they can be added to the appropriate table straight from the program window. For each mention, information is extracted by the program from Google.com searches as well, which can give the program a wealth of information about the mentions. Sample file Viewing Concepts & I2B2 Chain File with both UHD and I2B2 Links Shown

Statistics for our System Progress We are currently at around 75% F1 score. (Averaged over all test files.) Most algorithms for resolving coreference tend to have accuracy in the 60% range. With the time we have left, we will definitely increase this score.

We still haven't added detection for "Treatment" type concepts, which constitute a significant percentage of the concepts not found when computing our F1 score. Detection for "Test" type concepts still needs work. Current work Test Mentions Precision on "Test" type concepts is relatively low (30%). Mainly this is because many of the tests involve specific body parts (e. g. "chest x-ray" and "chest CT" are

sometimes linked by our rules). Tests also often involve times (e. g. "an x-ray was performed on 5 Aug." would link with "the x-ray on... December 10, 2010"). They also involve position (e.g. "x-ray on left lung" "x-ray on right lung") Current Work Problem Mentions Work on these mentions is about 50% complete

To finish, a few more database tables will need to be set up, and certain types of medical vocabulary loaded into them. We will also need a system for finding phrases made of different words, but mean the same thing AKA a thesaurus Possible future problems The main risk with a rule based approach is that our rules might be too specific to work with the contest data once it's distributed. Given the execution speed of our program, we should have

enough time to do any necessary modifications in the three days between contest data being sent and results submitted. There is also a slight problem with the fact that our application is made for a very specific purpose and is probably hard to generalize beyond the context of medical documents. Most coreference resolution tools are this way though. Not being able to code fast enough!

Future Necessities A reliable way to find the temporal setting of a particular sentence. o Did an injury described happen 20 years ago, or is the doctor giving instructions for a future case? These are not coreferent even though they may be the same word Thesaurus work o finding phrases that mean the same thing, but use completely different words Output

o The program will not output files in the I2B2 competition format, we will have this feature made as the competition deadline draws near.

Recently Viewed Presentations

  • Exploring Needed Complexity in Structural Modeling Using ...

    Exploring Needed Complexity in Structural Modeling Using ...

    Exploring Needed Complexity in Structural Modeling Using Procrustes Analysis. A Common Question. Known: Drilling plan. Development Plan. Flow Parameter to Quantify the Uncertainty of. A conceptual model of the faulting in the field. Unknown:
  • ECG Interpretation - Eastern Illinois University

    ECG Interpretation - Eastern Illinois University

    Monaco ヒラギノ角ゴ ProN W3 Arial Calibri Times Office Theme ECG Interpretation ECG Interpretation Standardization Rate Heart Rate Calculation Rhythm AV Junctional Rhythms ECG Waves P wave PR Interval Classification of AV Heart Blocks Wolff-White-Parkinson ECG Waves Normal QRS Normal Q...
  • SlideModel Free PowerPoint Templates

    SlideModel Free PowerPoint Templates

    Click on one of the "View page" links to view the page.. To change the subdirectory of the page name click on "Edit" under Permalink and type your custom name. We don't recommend using "Default Editor" because it will swipe...
  • Low-Intensity Strategies:

    Low-Intensity Strategies:

    which students may require more intensive supports. Promotes fluency and automaticity, freeing students to tackle more complex concepts. Increases active participation, even during whole-group delivery. Feedback is rapid and matter-of-fact, which reduces the pressure of answering correctly.
  • THE MONOLITHIC 3D-IC: Logic + eDRAM on top

    THE MONOLITHIC 3D-IC: Logic + eDRAM on top

    Our industry sources + calculations $50 ion-cut cost per $1500-$5000 wafer in a free market scenario (ion cut = implant, bond, anneal). ... Advantage: Thinned donor wafer is transparent to litho, enabling direct alignment to device wafer alignment marks: no...
  • 05301 EPA Solar Oven - EDGE

    05301 EPA Solar Oven - EDGE

    Project Mission Statement Needs Assessment Scope Limitations Scope Limitations (Cont'd) Order Qualifiers Pairwise Comparison of Attributes Ranking of Attributes Order Winners: Top 5 Attributes House of Quality Feasibility Assessment: How Many Prototypes to Build Feasibility Assessment: How Many Prototypes to...
  • comp1_unit4d_lecture_slides.ppt - Lane Community College

    comp1_unit4d_lecture_slides.ppt - Lane Community College

    Introduction to Healthcare and Public Health in the US Financing Healthcare (Part 1) Lecture d This material (Comp1_Unit4d) was developed by Oregon Health and Science University, funded by the Department of Health and Human Services, Office of the National Coordinator...
  • 5 C's of Credit

    5 C's of Credit

    5 C's of credit. Character. Capacity. Capital. Collateral. Credit History. Credit Rating: A measure of a person's ability and willingness to make payments on time.