Linking Entities in #Microposts

Linking Entities in #Microposts

Linking Entities in #Microposts R O M I L B A N S A L , S A N D E E P PA N E M , P R I YA RA D H A K R I S H N A N , M A N I S H G U P TA , VA S U D E VA VA R M A I N T E R N AT I O N A L I N S T I T U T E O F I N F O R M AT I O N T E C H N O LO GY , H Y D E RA B A D 7TH APRIL 2014 [email protected] Introduction Entity Linking is the task of associating entity name mentions in text to the correct referent entities in the knowledge base, with the goal of understanding and extracting useful information from the document. Entity Linking could be helpful for various IR tasks like document classification and clustering, tags recommendation, relation extraction etc.

[email protected] Motivation Social Media like Twitter is a source of a wide variety of information. Identifying entities in tweets can help in various tasks like tracking products, events etc. Tweets being short and noisy lack sufficient context for entity mention to be disambiguated completely. So we tried to enhance the context based on the information shared by the other users about the entity on social media like Twitter along with the local context of the entity. [email protected] Related Work Various approaches for tweet entity Linking have been proposed in the past.

Leu et. al [ELFT13] use mention-entry similarity, entry-entry similarity, and mention-mention similarity and simultaneously resolve a set of mentions from tweets. Meij et. al [ASMP12] tried to link the entities in the tweets based on various ngrams, tweets and concept features. Guo et. al [TLNL13] tried to model entity linking as structured learning problem by simultaneously learning mention detection and entity linking. [email protected] Our Approach (System Architecture) [email protected] Mention Detection Mention Detection is the task of detecting phrases in the text that could be

linked to possible entities in the knowledge base. We used POS patterns from ARK POS Tagger [POST11] coupled with the T-NER Tagger [NERT11] to find the mentions in the given text. 1. 2. ARK POS Tagger: Extract all sequences of proper nouns, and label longest continuous sequence as a mention. T-NER POS Tagger: Extract chunks with at least one proper noun, and label them as mention. Merging Mentions: Merge the entity lists from the two systems. In case of conflict, select the longest possible sequence as entity mention in the text. [email protected] Entity Disambiguation

Entity Disambiguation is the task of selecting the correct candidate from the possible list of candidates for the given Entity Mention. We treated the problem of entity disambiguation as a ranking problem. We extracted the ranked entities using 3 different methods and later merged the ranked lists based on the machine learning model. 1. Wikipedia Based Measure (M1): Extract the entities that best matches the Wikipedias pages title and body text and rank them according to the Wikipedias page similarity with the mention. 2. Google Cross-Wiki Based Measure (M2): Extract and rank the entities based on the similarity between the anchor text [CLDE12] used across various web pages (for referring a Wikipedia Entity) and the mention.

3. Twitter Popularity Based Measure (M3): Extract the entities based on the similarity between the anchor text and the text used while referring the mention (in other tweets) on Twitter. [email protected] Entity Disambiguation (cont.) The ranked lists from three different models (Wikipedia based (M1), Google Cross-Wiki Based (M2) and Twitter Popularity Based (M3)) are merged based on the LambdaMART model. LambdaMART [ABIR10] combines MART and LambdaRank to generate an overall ranking model combining the ranks of three individual measures. The top ranked entity is taken as the disambiguated entity for the given entity mention.

[email protected] Dataset #Microposts2014 NEEL Challenge Dataset is used for evaluating the system. 2.3K Tweets, manually annotated 70% Training 30% Testing [email protected] Results Entity Mention Detection and Entity Disambiguation Method ARK POS Tagger T-NER POS Tagger ARK + T-NER

(Merged) Accuracy 77% 92% 98% Table 1: Performance for Mention Detection Method F1- measure M1

0.335 M2 0.100 M3 0.194 M1+M2 0.335 M2+M3

0.244 M1+M3 0.405 M1+M2+M3 0.512 Table 2: Performance for Entity Disambiguation [email protected] Conclusion

For effective entity linking, mention detection in tweets is important. We improve the accuracy of detecting mentions by combining two Twitter POS taggers. We resolve multiple mentions, abbreviations and spell variations of a named entity using the Wikipedia and Google Cross-Wiki Dictionary. We also use popularity of an entity on Twitter for improving the disambiguation. Our system performed well with a F1 score of 0.512 on the given dataset. [email protected] References [TNLN13] S. Guo, M.-W. Chang, and E. Kcman. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking. In Proc. of the Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) [ASMP12] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM 2012. ACM, 2012

[ELFT13] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. 2013. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics [NERT11] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental study. In Proc. Of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011 [POST11] K. Gimpel, N. Schneider, B. OConnor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech Tagging for Twitter: Annotation, Features, and Experiments. In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2 (NAACL-HLT), pages 4247, 2011 [CLDE12] V. I. Spitkovsky and A. X. Chang. A Cross-Lingual Dictionary for English Wikipedia Concepts. In Proc. of the 8th Intl. Conf. on Language Resources and Evaluation (LREC), 2012. [ABIR10] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3):254270, Jun 2010 [email protected]

Recently Viewed Presentations

  • PEST Analysis - Gestaoempresasisla20072010's Blog

    PEST Analysis - Gestaoempresasisla20072010's Blog

    Ver Casos de Competitiveness of Nations is a field of Economic theory, which analyses the facts and policies that shape the ability of a nation to create and maintain an environment that sustains more value creation for its enterprises...
  • COE 202: Digital Logic Design Sequential Circuits Part 3

    COE 202: Digital Logic Design Sequential Circuits Part 3

    Use FF's excitation table to complete the table. Derive state equations. Obtain the FF input equations and the output equations. Use K-Maps. Draw the circuit diagram. KFUPM. Step1: Obtaining the State Diagram. A very important step in the design procedure.
  • Edexcel AS Level Biology - King's General Science

    Edexcel AS Level Biology - King's General Science

    Plants have no skeleton for support. The rigid cell wall gives the cell strength. The vacuole keeps the cell inflated. Plants make their own food using photosynthesis. They have chloroplasts with chlorophyll in them to do this. ClickBiology ClickBiology
  • Mini Mall Design Submitted by: Dalya Dawoud Dina

    Mini Mall Design Submitted by: Dalya Dawoud Dina

    Fire resisting . Automatic system centralized was selected in this building, the smoke. Alarm sound throughout the mall . There are two stairs in the project which is considered as emergency exist.
  • 1.2: Displaying Quantitative Data with Graphs Section 1.2

    1.2: Displaying Quantitative Data with Graphs Section 1.2

    Draw a horizontal axis (a number line) and label it with the variable name. ... A graph of the distribution may be clearer if nearby values are grouped together. Most common graph of the distribution of . one . quantitative...
  • Critical Reading Through Grammar - Weebly

    Critical Reading Through Grammar - Weebly

    Listen for nouns as I read the poem "My Papa's Waltz" to you. As you hear a noun, write it in the proper column. After I finish reading the poem two times, your grammar squad will have five minutes to...
  • First Randomised Controlled Trial In Dialysis

    First Randomised Controlled Trial In Dialysis

    First Randomised Controlled Trial In Dialysis. The National Cooperative Dialysis Study (NCDS) adalah multicentric, randomized and controlled yang pertama menyelidiki impact dari dosis dialysis thd outcomepasien.
  • Fact and Opinion

    Fact and Opinion

    Simile . because it compares two unlike things using "like" or "as" Concept development. Introduce meaning of hyperbole, have students pronounce exaggeration. Give an example. Have students pair share the meaning of hyperbole and tell why the example is an...