Entity Linking System

Problem Statement

Given a text and knowledge base, find all the entity mentions in the text(Recognize) and then link them to the corresponding correct entry in the knowledge base(Disambiguate).

There are two parts to entity linking:

  • Named-entity recognition
  • Disambiguation

Applications:

  • Semantic search
  • Content analysis
  • Question answering systems/chatbots/virtual assistants, like Alexa, Siri, and Google assistant

Metrics

  • Offline metrics
    Offline metrics will be aimed at improving/measuring the performance of the entity linking component.

    • Name Entity Recognition (NER)

      • The tagging scheme: IOB2 (aka BIO)
        • B - Beginning of entity
        • I - inner token of a multi-token entity
        • O - non-entity
      • Performance measurement
        • $Precision = \frac{no. ;of; correctly ;recognized ;named ;entities}{no. ;of; total ;recognized ;named ;entities}$
        • $Recall = \frac{no. ;of; correctly ;recognized ;named ;entities}{no. ;of; named ;entities ;in ;corpus}$
        • $F1-score = 2\frac{precisionrecall}{precision+recall} $
    • Disambiguation

      • Precision
      • Recall does not apply here as each entity is going to be linked (to either an object or Null).
    • Named-entity linking component

      • F1-score as the end-to-end metric

        • True positive: an entity has been correctly recognized and linked.
        • True negative: a non-entity has been correctly recognized as a non-entity or an entity that has no corresponding entity in the knowledge base is not linked.
        • False positive: a non-entity has been wrongly recognized as an entity or an entity has been wrongly linked.
        • False negative: an entity is wrongly recognized as a non-entity, or an entity that has a corresponding entity in the knowledge base is not linked.
      • Micro-averaged metrics

        • Aggregate the contributions of all documents to compute the average metrics.
        • Focus on the overall performance, do not care if the performance is better for a certain set.
        • $Precision = \frac{\sum^{n}{i=1}TP_i}{\sum^{n}{i=1}TP_i + \sum^{n}_{i=1}FP_i}$
        • $Recall = \frac{\sum^{n}{i=1}TP_i}{\sum^{n}{i=1}TP_i + \sum^{n}_{i=1}FN_i}$
        • Micro-averaged F1-score
      • Macro-averaged metrics

        • Compute the metric independently for each document and takes the average (equal weightiage).
        • $Precision = \frac{\sum_{i=1}^n P_{di}}{n}{where; n= no. ;of; documents,; P{di}= precision; over; document; i}$
        • $Recall = \frac{\sum_{i=1}^n R_{di}}{n}{where; n= no. ;of; documents,; R{di}= recall; over; document; i}$
        • Macro-averaged F1-score
  • Online metrics
    Online metrics will be aimed at improving/measuring the performance of the larger system by using a certain entity linking model as its component. (A/B experiments)
    • Search engines: i.e. session success rate
    • Virtual Assistants: i.e. user satisfaction

Architecture

  • Model generation path

    • Named Entity Recognition
      • Responsible for building models to recognize entities (i.e. person, organizations)
    • Named Entity Disambiguation
      • Candidate generation
        • Reduce the size of the knowledge base to a smaller subset of candidate entities
      • Linking
    • Metrics
      • NER and NED seperately, Entity linking as a whole.
  • Model execution path

Modeling

  • NER Modeling

    • Contextualized text representation
      • ELMo
        • Raw word vectors: character-level CNN or Word2Vec
        • Bi-LSTM with forward/backword pass: “Shallow” bi-directional since the two pass LSTMs are trained independently and concatenated naively, therefore, the left and right contexts can not be used ‘simultaneously’.
      • BERT
        • The transformer layer will take advantage of contexts from both directions, but with problems - “sees itself indirectly”.
        • Prediction Objectives: Masked language modeling (MLM) and Next sentence prediction (NSP)
    • Contextual embedding as features
      • Utilize contextual embeddings generated by BERT as features (transfer learning) in our NER classifier.
      • The model will not be modified, only the final layer output is used as embedding features.
    • Fine-tuning embeddings
      • Take the pre-trained models and fine-tune them based on our NER dataset to improve the classifier quality.
  • Disambiguation Modeling

    • Candidate generation

      • Higher recall by building an index where terms are mapped to knowledge base entities.
      • Methods:
        • Based on terms’ names and their concatenations.
        • Make use of referrals and anchor text.
        • Embedding methods + K Nearest Neighbors for a particular term.
    • Linking

      • Higher precision.
      • The prior: for a certain entity mention, how many times the candidate entity under consideration is actually being referred to.

Author

Jie Sun

Posted on

2024-01-12

Updated on

2024-03-31

Licensed under

Comments