Semantic Network - Carleton University

Semantic Network - Carleton University

Statistical Natural Language Processing Source: Richard Khoury (Laval U.) Instructor: B. John Oommen 2 Objectives Overview of standard NLP techniques Build a question-answering AI Text Representation Classification Information Retrieval Question-Answering 3 Natural Language Processing Branch of AI that makes computer agents that can use human languages Automated translation Automated summary Text classification & clustering Text generation

Speech understanding & generation Author recognition Information retrieval & question-answering Sentiment detection Event detection in social media Spam filtering Digital assistants Etc. 4 Statistical NLP Uses corpus of language to learn typical usage probabilities Develop algorithms that make best (e.g. most probably correct) decision Other alternatives:

Neural network NLP Rule-based NLP Fuzzy NLP 5 Statistical NLP Modelling language use with probabilities But I dont count probabilities when I talk! Actually, you do Zipfs law of least effort People try to do the minimum amount of effort But they are smart about it If more effort now means much less effort later on, they use effort now So what is the effort to minimize when using language? 6 Least Effort in Language Imagine two people

Tina talks all the time Lester listens all the time 7 Least Effort in Language Whats the effort in talking? Choosing the words to say with the correct meaning Least effort = one word with all meanings 8 Least Effort in Language Whats the effort in listening? Choosing the correct meaning of the words heard Least effort = one meaning per word 9 Least Effort in Language Tina and Lester have exactly opposite least efforts But in real life, no one talks or listens all the time Natural languages evolve to balance out effort of talking and listening

10 Least Effort in Language Natural language have: A few words that are used a lot and have flexible meanings A lot of words that have precise meanings and are rarely used 11 Zipf-Mandelbrot Law Count frequency of each word in a NL corpus of any language Plot frequency rank frequency count Will create a power-law distribution = ( + ) English: P = 105.4, = 100, B = 1.15 Foundation of statistical NLP! 12 Zipf-Mandelbrot Law (example) 25 Two articles from The Charlatan, 8 January 2014

Sprott unveils new Master of Accounting program 331 words 174 distinct words 122 words used once CUSA Midterm Review: have they kept their campaign promises 611 words 250 distinct words 145 words used once 20 15 10 5 0 40 35 30 25 20 15 10 5

0 13 Zipf-Mandelbrot Law (example) Sprott unveils new Master of Accounting program CUSA Midterm Review: have they kept their campaign promises Word Frequency Word Frequency the 22 the 35 to 13 to

24 program 11 of 19 said 9 and 18 of 9 a 16 Herauf 7 CUSA 15

a 7 is 11 students 6 it 10 in 6 Odunayo 9 professional 6 in 9 by

5 students 8 14 Zipf-Mandelbrot Law A few words that are used a lot and have flexible meanings 10 most frequent words are 30% of each article The (10 definitions), to (14 definitions), of (12 definitions), and (15 definitions) Short list of stopwords that can be filtered out A lot of words that have precise meanings and are rarely used More than half the words used only once Includes: campaign, faculty, graduate, administration, vegetables, president, education, And some exceptions! Program, students, professional, CUSA, Herauf, Odunayo

15 Bag of Words The word + frequency representation of text is the bag of words model Assumes words are independent of each other Assumes word order does not matter Sometimes enhanced by including collocations: pairs of words used together (ex.: wall street, make up) Creates the document vector Word Frequency the 22 to

13 program 11 said 9 of 9 Herauf 7 a 7 students 6 in 6 professional 6

by 5 16 Bag of Words Classification Predicting the topic of a document given its words Common task in NLP Spam filtering, genre classification (libraries, websites), reading level assessment (education), etc. Active research area Supervised learning: Trained on correct examples Training corpus of classified documents AI learns features useful for classification 17 Bag of Words Classification Training: 1. 2. Build word list of entire corpus Build bag of word for each category

Example 5 Charlatan articles each on sports & arts The best and brightest films of December, Aboriginal Service Centre hosts storytelling night, Author talks queerness in art history, Monthly concert series welcomes minors, Venus Envy hosts second dirty art show, Midseason review: Womens hockey, Midseason review: Womens basketball, Midseason review: Mens basketball, Mens hockey team edges York in exhibition play, Killeen practices with the Senators 2418 instances of 1233 words Art Show Films Film Ottawa Years Said Ravens Game Team Sports 0 1 0

0 4 1 26 32 28 17 Arts 24 11 8 6 9 1 30 0 2

2 18 Bag of Words Classification Testing 1. 2. Build word vector of new document d Compute cosine similarity with vector of each category c ( , )= 3. = ,0 , 0 + ,1 ,1 +...+ , 1 , 1 2 ,0 2 ,1 = + +...+ Classify into most similar category 2 , 1

19 Bag of Words Classification Art Show Films Film Ottawa Years Said Ravens Game Team Sports 0 1 0 0 4 1 26 32

28 17 Arts 24 11 8 6 9 1 30 0 2 2 Can we do better? We already filter out stopwords (determiners, articles, and pronouns) Group together different forms of same word

Word stemming: remove prefix, suffix and ending of words, keeping only stem or root Add words missing from some documents Expansion: Add synonyms, dictionary definitions, collocations, of keywords Especially useful for short texts (e.g. query expansion) Weight words Tell apart significant and insignificant words 20 TFIDF Recall Zipf-Mandelbrot law Significant words are generally rare words with unusually high occurrence in a document So we need the general occurrence Proportion of documents d in corpus C that contain word w

Inverse Document Frequency ( , )= log And the occurrence in document di Term Frequency ( , )= 21 TFIDF Art Show Films Film Ottawa Years Said Ravens Game Team Sports 0 1 0

0 4 1 26 32 28 17 Arts 24 11 8 6 9 1 30 0 2

2 Art Show Films Film Ottawa Years Said Ravens Game Team Sports 0 0.15 0 0 0.61 0.15 4.00 13.32

4.30 2.61 Arts 9.00 1.52 3.00 2.25 1.24 0.13 4.16 0 0.28 0.28 Computation notes: |Sports| = 1147, |Arts| = 1271, |Corpus| = 3 Assume a 3rd category that is 0 everywhere Otherwise log(2/2) = 0 Not an issue for real corpora

TFIDF values 10-3 22 Nave Bayes Classification Most popular alternative to bag of words 1 ( ) = ( , 0 ,... , , 1) = ( ) ( , ) =0 P(C) Proportion of documents per category in the corpus P(wd,i|C) Proportion of word wi per category in the training corpus Classify in most probable category 23 Nave Bayes Classification 1 ( ) = ( , 0 ,... , , 1) = ( ) ( , ) =0 P(wi|C) computed from observation in corpus

What if word wi never occurs in a category? Multiplication by zero! Given most words occur rarely (recall ZipfMandelbrot), this will happen a lot Probability smoothing: decreasing probability of observed words and distributing it to unobserved words 24 Information Retrieval Retrieve text documents that contain a requested (query) information in a corpus How? Text classification (BoW or NB) identifies document topics and measures similarity of documents IR Query = short document Find most similar documents in corpus! 25 Information Retrieval Massive and unstructured document corpus

Institutional databases, Internet collections, Wikipedia How to speed up IR search? Inverted index Database with words as keys and documents as attributes Faster retrieval of documents using keywords Positional information Position of words within document Easier to retrieve collocations Find documents with multiple keywords in proximity of each other 26 Information Retrieval How to find and rank relevant documents? Vector Space Model (VSM) Most widely used IR method Each documents N-word-long bag-of-words is a

vector in N-dimensional space Cosine distance between query BoW and each document BoW d Most similar documents are most relevant Can be made more accurate using TFIDF, etc. Inverted index useful to quickly find documents for cosine word0 0 query d1 d2 word1 27 Information Retrieval Problem: short queries Might not use same words as documents Word co-occurrence

Some words are frequently used in the same context If query has one word and document has the other, VSM wont match them d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 Art 7 1 15 0 0

0 0 0 0 0 Show 9 1 0 1 0 0 1 0 0 0 Films

0 0 0 0 8 0 0 0 0 0 Ravens 0 0 0 0 0 7

7 5 6 9 Game 0 0 0 0 2 0 8 5 11 4 Team 0

0 0 0 2 3 3 3 2 3 28 Information Retrieval Solution: Latent Semantic Indexing (LSI) Vector space with latent semantic dimensions Co-occurring words projected into same dimension Non-co-occurring words projected into orthogonal dimensions

Dimensionality reduction VSM has N dimensions for N words LSI has k dimensions for k semantic meanings k << N 29 Information Retrieval with LSI Singular Value Decomposition (SVD) LSI computation method Represents matrix A (in word space) as matrix (in semantic space) Matrix A is the document/word matrix with N words and d documents Matrix is the semantic space matrix with N words and d documents Sorts dimensions by importance Makes it easy to reduce by cropping less useful dimension 30 Information Retrieval with LSI Visualizing SVD transformation

ntic a Sem e c a sp Documents Word space 31 Information Retrieval with LSI Reduced Singular Value Decomposition 1. 2. One of many SVD algorithms Document/word matrix A with N words/dimensions Decompose matrix as = ( ) Rotates dimensions to orientation of largest variation Matrix contains ordered singular values measuring variation of each dimension Keep k dimensions in where value > threshold 3.

4. Keep only k dimensions with noteworthy variations = ( ) Rescale to k dimensions Word/document matrix: Document or IR query: ( ) = 1 ( ) 1 = 1 32 Information Retrieval with LSI Reduced SVD computation: = ( )

U is the eigenvectors of AAT sorted by decreasing order of eigenvalue V is the eigenvectors of ATA sorted by decreasing order of eigenvalue Is the square roots of the eigenvalues sorted by decreasing order in a diagonal matrix 33 Information Retrieval with LSI Reduced SVD example: 275 64 0 64 84 0 = 0 1 2 3 4 5 6 7 8 9 0 Show 9 10 10 01 0 00 Films 0 00 0 8 0 0 00 0 Ravens 0 0 00 0 77 5 6 9Game 0 0 00 20 8 511 4 Team 0 0 0 02 33 3 23 = 0 0 64 0 7 0 Art 7 1 15 0 0 0 0 0 0 8 16 0 3 16 [ ] 0.01 0.96 0.00 0.03 0.29 0.01 0.04 0.00 0.87 =

0.69 0.01 0.37 0.66 0.01 0.31 0.29 0.01 0.07 0.29 0.02 0.00 0.95 0.07 0.00 0.04 0.43 0.23 0.05 0.49 0.37 0.03 0.68 0.01 0.03 0.32 0.90 [ ] 0 0 0 7 8 3 0 16 16 240 183 96 183 230 77 96 77 44 [ ] 21.34 0 0 0 0 0 0 17.16 0 0 0 0 0 0 8.42 0 0 0 = 0 0 0 8.02 0 0 0 0 0 0 7.10 0 0 0 0 0 0 1.14 [ ] 130 16 105 9 0 0 9 16 2 15 1 0 0 1

105 15 225 0 0 0 0 9 1 0 10 0 1 Eigenvalues = [ 455.58 294.40 70.95 64.32 50.45 1.30 ] = 0 0 0 0 72 6 22 0 0 0 0 6 58 58 9 1 0 1 22 58 123 0 0 0 0 16 44 84 0 0 0 0 26 48 136 0 0 0 0 14 72 104 0 0 0 0 0 0 0 0 0 0 0 0 16 26 14 44 48 72 84 136 104 59 91 74 91 161 104 74 104 106 [ ] 34 Information Retrieval with LSI Reduced SVD example: Rescaling into 2D Simply crop matrices to k = 2! =

[ 0 .01 0.03 0.96 0.29 = 21.34 0 [ 0 17.16 0.00 0 .29 0.01 0.95 ] 0.02 0.07 0 .02 0.00 0.01 0.00 = 0.10 0.27 0.52 0.36 0.56

0.46 0.00 0.00 0.54 0.07 0.84 0.02 0.00 0.01 0.00 0.01 0.01 0.01 [ ] ] 35 Information Retrieval with LSI Rescaling word/document matrix to 2D = ( ) = 21.34 0 0.02 0.00 0.01 0.00 0.10 0.27 0.52 0.36 0.56 0.46 0 17.16 0.54 0.07 0.84 0.02 0.00 0.01 0.00 0.01 0.01 0.01 1 2 3 4 5 6 7 8 9 0

9.82 Dimension 2 9.271.20 14.410.34 0.000.17 0.000.170.17 0.17 Dimension 1 0.43 0.00 0.21 0.00 2.13 5.76 11.10 7.68 [ ][ [ ] ] 36 Information Retrieval with LSI Rescaling IR queries to 2D 1= ( ) 1 Query 1: 10x game 1= [ 0.01 0.96 0.03 0.29 0.04 0.00 0.69

0.01 0.66 0.01 0 0 0.29 0 Dimension 1 = 0.01 0 Dimension 2 10 0 6.60 0.10 0.66 0.01 0 10 0.29 0 Dimension 1 = 0.01 0 Dimension 2 0 0 0.30 2.90

Query 2: 10x show [ 0.01 0.96 2 = 0.03 0.29 0.04 0.00 0.69 0.01 [] [] ] ] [ [ ] ] 37 Information Retrieval with LSI

SVD example: Visualization into 2D Easy to find relevant documents for IR using Euclidean distance, cosine similarity, clustering, etc. 16 14 12 10 Dimension 2 8 Arts Sports Query Query 6 4 2 0 0 2 4 6 8 -2 -4

Dimension 1 10 12 14 38 Question Answering IR: query is words, retrieved info is relevant documents QA: query is question, retrieved info is answer found in relevant documents Requires additional steps from IR Question analysis Precisely determine what is asked for Passage retrieval Find texts likely to contain answers Answer selection Pick the best answer from the passages Answer presentation

Give the answer to the user 39 Question Analysis AI needs to understand what user is asking about What type of information? About what subject? Problem: Query is only a few words (1 to 4 on average after stopword removal) 40 Question Analysis Question type or Answer type What type of information is asked for? E.g.: person, location, date, quantity, definition, yes/no, No standard type list Not as simple as it sounds Who was Napoleon? vs. Who defeated Napoleon? What French emperor was defeated at Waterloo? vs. What year was a French emperor defeated at

Waterloo? vs. What was the Battle of Waterloo? 41 Question Analysis Question type classification approaches 1 Bayesian classification ( ) = ( 0 , ... , 1 )= ( ) ( ) = 0 Where P(C) is probability of a question type and P(wi|C) is probability of a query word given a question type Lexicon sorting keywords by query types Classify question based on its keywords Recall Zipfs law: a few words are used most often; lexicon can work with only 100 common words Pattern matching what {is|are} Definition Can match entire query or a subset of words (the informer span) 42

Question Analysis Dealing with few words: picking out important information Named Entity Recognition (NER) A proper named used in the query is usually very important Compare words to database of NE (can be constructed from Wikipedia) 43 Question Analysis Dealing with few words: inferring more words Query expansion Problem: Who killed Abraham Lincoln? and President Lincoln was assassinated by John Wilkes Booth. have only one word in common Add more words in query to help IR Add synonyms Kill murder, assassinate, take down, defeat Problem: some words have multiple meanings

Lincoln president, capital of Nebraska, mutton sheep 44 Passage Retrieval We can already retrieve documents with IR But for QA we need to retrieve specific passages with possible answers Who killed Abraham Lincoln? Answer is not Wikipedia page on Abraham Lincoln Answer is not paragraph about Lincoln assassination Answer is the passage President Lincoln was assassinated by John Wilkes Booth. Different algorithms possible based on info available to us 45 Passage Retrieval (example) If we have: Important query keywords From NER, word weights (e.g. TFxIDF), etc.

Inverted index & positional information From IR: Index of corpus words linked to documents with exact position of word in document Algorithm: Window of words in document around each keyword 46 Passage Retrieval (example) If we have: Question patterns & correct pattern for query (from question type classification) Relevant documents (from IR) Algorithm: Find matching answer patterns in document Simple modification of question pattern E.g.: what {is|are} {is|are}

Can be enhanced with grammar rules, synonyms of less important words Who killed Abraham Lincoln Abraham Lincoln was {killed| assassinated|murdered} by 47 Answer Selection Passage retrieval will generate a lot of passages Some will have the same answer written differently Some will have different/conflicting answers Some are irrelevant Problem: how to find the answer? Need to rate confidence of individual passage Need to compare/contrast different passages 48 Answer Selection Passage / answer rating

Determine system confidence in individual answer Does answer fit the question? Important question words (especially named entities) appear in passage Keywords in passage match question semantics (distance in ontology, WordNet) Answer in passage match syntactic role in question Does answer fit predicted answer type? But the answer type classifier can make mistakes Geospatial & temporal relevance of answer Useful for cell phone QA, virtual assistants, etc. 49 Answer Selection Deal with same answer written differently System needs lists of synonyms, abbreviations,

idioms, alternate spellings, etc. Merge together answers & combine ratings Can serve as democratic rating below Deal with different/conflicting answers Answer rating includes answer frequency Democratic: correct answer is most frequently cited in different sources But dont score same source multiple times! Answer rating includes source reliability Elitism: answer from trusted source is correct one 50 Answer Presentation What is returned by QA system? Single best answer vs. list of possible answers Include or not confidence rating of answers Include or not sources of answer Reword answer properly or copy-paste text 51 Answer Presentation

How to reword answers properly Syntactic answer templates Detect question syntax Words part-of-speech and word order Corresponding answer template filled it from question and answer words Advantage: human-like answers Semantic answer templates Tag semantic content in answer E.g. core answer, complementary information, justifications, constraints, etc. Template selects which content to display and where Advantage: can be tailored to devices or needs without losing critical info 52 Example of QA Systems: IBM Watson

Passage retrieval contains correct answer for 85% of questions in top 250 candidates Query parsing, semantic role labelling, NER, etc. Evidence: passage search featuring candidate answer in context of question Special type of question analysis to handle Jeopardy clues Aggregate score of 50 features including taxonomy, geospatial, temporal, source reliability, gender, name consistency, relation, passage support, etc. Merge multiple instances of same answer 53 Example of QA Systems: Ephyra Ephyra uses filters for answer selection Selection and order can be changed 54

Example of QA Systems: LCC Chaucer-2 2nd place in TREC 2007 QA competition Series of queries on each topic 55 Summary: Natural Language Processing Zipf-Mandelbrot Law Bag of Words representation Stopwords, word stemming, collocations, TFxIDF Classification Bag of Words & cosine distance, Nave Bayes Information retrieval Inverted index, Vector Space Model, Latent Semantic Indexing, text segmentation Question Answering Question analysis, passage retrieval, answer rating, answer presentation, examples 56

Further Readings 57 Further Readings U. Ct Allard, R. Khoury, L. Lamontagne, J. Bergeron, F. Laviolette, and A. Bergeron-Guyard, Optimizing QuestionAnswering Systems Using Genetic Algorithms, FLAIRS-28 (2015). F.B. Dian Paskalis and M.L. Khodra, Word sense disambiguation in information retrieval using query expansion, ICEEI, pp. 1-6 (2011). D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, N. Schlaefer, C. Welty, "Building Watson: An Overview of the DeepQA Project", AI Magazine, 31:3, pp. 59-79 (2010). S. Harabagiu, D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Grji, V. Rus and P. Morarescu. FALCON: Boosting knowledge for answer engines, Proceedings of TREC-9, pp. 479-488 (2000). A. Hickl, K. Roberts, B. Rink, J. Bensley, T. Jungen, Y. Shi, J. Williams, Question answering with LCCs Chaucer-2 at TREC 2007, Proceedings of TREC-16 (2007). J. Luo, B. Meng and X. Tu Selecting good expansion terms based on similarity kernel function, NLP-KE, pp. 156160 (2011). B. Magnini, M. Speranza and V. Kumar. Towards Interactive Question Answering: An Ontology-Based Approach, IEEE ICSC '09, pp. 612-617 (2009). B. J. Oommen, R. Khoury and A. Schmidt, Text Classification Using Novel Anti-Bayesian Techniques, ICCCI-7 (2015). M. Razmara, A. Fee and L. Kosseim. Concordia University at the TREC 2007 QA track; Proceedings of TREC-16 (2007). N. Schlaefer, R. Gieselmann, T. Schaaf, A. Waibel, A pattern learning approach to question answering within the Ephyra framework. LNAI, 4188:687-694. P. Sojka, I. Kopecek, K. Pala, (eds.): Springer. (2006) D. Shen, R. Pan, J.-T. Sun, J.J. Pan, K. Wu, J. Yin, and Q. Yang. [email protected]: our winning solution to query classification in KDDCUP 2005, ACM SIGKDD, vol. 7, no. 2, pp. 100-110 (2005). S. Song and Y.-N. Cheah, An unsupervised center sentence-based clustering approach for rule-based question answering, IEEE ISCI, pp. 125-129 (2011). 58 Thank You

Recently Viewed Presentations

  • October 29th, 2014 University of Pennsylvania

    October 29th, 2014 University of Pennsylvania

    engaged the services of Mike Hyrman at Encore Health Resource, LLC to develop code to export retrospective surgical pathology reports from Cerner Pathnet into firewall-protected honest brokerage archive
  • Use of Defensive Weapons

    Use of Defensive Weapons

    If the expectation of criminal attack is enough to justify arming hospital security officers, then the purchase and issuance of ballistic vests is equally justified. An armed officer should be issued a level III ballistic vest (body armor) as Personal...
  • Your Presentation Title - Wyoming

    Your Presentation Title - Wyoming

    ESEA also requires science testing once in each grade span, elementary, middle and high ... but can use concordance tables to roughly compare 2013 to 2014. ... The Median Growth Percentile (MGP) is used in the Wyoming school rating .
  • Test

    Test

    Peter Fox . Data Analytics - ITWS-4600/ITWS-6600/MATP-4450. Group 2 Module 6, February 12, 2018. Weighted kNN, clustering, "early" trees and Bayesian
  • groveportreferee.weebly.com

    groveportreferee.weebly.com

    Groveport Parks and Recreation Soccer Officials Training. Introductions . Today we will : Go over the laws of the game. Take an assessment over the laws and league rules. Partake in on-field training. In order to referee GPRD games, you...
  • T-Space: What You Need to Know About Electronic Submission

    T-Space: What You Need to Know About Electronic Submission

    T-Space: What You Need to Know About Electronic Submission Julie Hannaford, [email protected] Meryl Greene, [email protected]
  • Domestic Violence and the Supervised Visitation Center

    Domestic Violence and the Supervised Visitation Center

    The Supervised Visitation Center Courtney Hall, MSW Program Coordinator D.C. Superior Court Supervised Visitation Center Supervised Visitation Center (SVC) Court operated facility that gives non-custodial parents a place to have supervised visitation with their children Serves as a neutral third...
  • Quick Quiz #1: 2 pts each=14 - PC|MAC

    Quick Quiz #1: 2 pts each=14 - PC|MAC

    Quick Quiz #2 total:18 pts. 1.Describe the shoulder joint in terms of the anatomy (3 structures that assist in physiology of it) 2.Why are shoulder injuries so common? 3.What are the 3 types of joints and describe each? 4.Describe 4...