Artificial intelligence has shown promising results in many areas, revolutionizing our daily lives. Machines can now understand human language thanks to natural language processing (NLP), one of the most promising areas of AI research. It’s at the heart of all the technologies we use every day, including search engines, chatbots, spam filters, grammar checkers, voice assistants, and social media monitoring tools.
Entity linking (EL) refers to the automatic connection of entities mentioned in the text to their corresponding entries in a knowledge base, such as Wikidata, a collection of facts about those entities.
In natural language processing (NLP) applications, including question answering, information retrieval, and natural language understanding, entity binding is a typical first step. It is essential for linking unstructured text to knowledge bases, allowing access to a multitude of carefully selected documents.
Experiments on common data sets show that current EL systems perform exceptionally well. However, in practical applications, they are insufficient for the following reasons:
- They are computationally intensive, which increases the cost of large-scale processing.
- It is difficult to simply adapt most EL systems to other knowledge bases because they are designed to link to certain knowledge bases (usually Wikipedia).
- The most effective approaches cannot link texts to entities added to the knowledge base after training (a task known as zero-shot EL), requiring continuous retraining to keep them up to date.
New work from the Amazon team has unveiled a brand new EL system called ReFinED on the NAACL 2022 industry track that addresses all three issues. By expanding on this work, they also introduce a new approach to add more knowledge base data to the model and increase accuracy. ReFinED outperforms peak performance on standard EL datasets by an average of 3.7 points in the F1 score, a metric that takes into account both false positives and false negatives.
ReFinED is able to link and generalize entities without firing a shot at massive knowledge bases like Wikidata, which has 15 times more entities than Wikipedia. The system is efficient and effective in extracting entities from web-scale datasets, for which the model has been successfully deployed within Amazon. It combines speed, precision and scale.
ReFinED performs EL using fine-grained entity types and entity descriptions. However, the team applied a simple Transformer-based encoder, outperforming state-of-the-art designs in performance across five EL datasets.
ReFinED is 60x faster than comparable models and therefore approximately 60x more resource-efficient to run than previous work, as it performs mention detection, fine feature typing (feature type prediction) and entity disambiguation (entity notation) for all mentions in a document in a single forward pass.
While working on this method, the researchers encountered an issue that sometimes mentioned candidate entities that could not be distinguished using the knowledge base entity descriptions and types.
To overcome this shortcoming, the researchers continued their experiments and proposed a method that relies on additional knowledge base information about the candidate entities. In their second paper, “Improving entity disambiguation by reasoning from a knowledge base”, they explain that they added a second mechanism to the model that allows it to predict the connections between pairs of mentions in the text to use this type of information.
The team improved peak performance by 12.7 F1 points on the “ShadowLink” data set, which focuses on very difficult situations. Additionally, performance was improved by an average of 1.3 F1 points across six datasets regularly used in the literature by including this technique in the model.
This Article is written as a summary article by Marktechpost Staff based on the research paper 'ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, github and reference article. Please Don't Forget To Join Our ML Subreddit