Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) that identifies and classifies entities present in a text into predefined categories such as names of people, organizations, locations, dates, monetary values etc. This helps in extraction of meaningful information from unstructured text data.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Sachin Tanwar on 23rd Oct 2024.

 

Applause for all the respondents - Narendra Purushothama, Sachin Tanwar, Deep Dave.

Featured Replies

Q 714. How do Named Entity Recognition (NER) systems handle ambiguous terms, and what techniques can enhance their accuracy in real-world applications? Try running this through different large language models (LLMs) and share the varied responses as examples. Feel free to compare their outputs for added insights!

 

Note for website visitors -

Solved by Sachin Tanwar

NER systems often encounter ambiguous terms, which can lead to incorrect entity classifications. These ambiguities can arise due to various reasons, such as:

  • Polysemy: A word having multiple meanings. For example, if we are doing customer complaints analysis, "Satisfied" can refer to a "customer representative service/response", "satisfaction regarding product/technology", or dissatisfaction quoted by customer such as "not satisfied".
  • Homonymy: Words with the same spelling but different meanings. "firm" can refer to a financial institution or the direction of strength.
  • Contextual ambiguity: The meaning of a term depending on the surrounding context. In case Image to text extraction if we are trying to extract name of storefront from name plate, "The Burger House" might refer to a store name, while "The Burger House Special" can refer to the dish on the menu. 

 

Techniques mentioned by difference LLMs

GPT : Transfer Learning, Domain Specific Training & Data Augmentation

Gemini: Contextual (Window-based feature, Long Short-term memory), Lexical (Gazetteers, Part of Speech tagging)

Claud: Contextual (Window-based approaches examining surrounding words, Long-range dependencies using attention  mechanisms & Syntactic parsing to understand grammatical relationships), Disambiguation Strategies (Statistical modeling of entity co-occurrence, Domain-specific rules and gazetteers & Word sense disambiguation techniques & Accuracy Enhancement Methods) etc.,

 

The Real time example which we used in Image to Text Extraction is using LTSM, Gradient boosting method and eliminating contextual ambiguity is Image text extraction to match storefront name. The text with less bench mark score would be eliminated in each iteration and finally end up with words matching 80%+ Accuracy.

 

Example: Extract the Text from the Image using OCR --> Build Bag of words --> Eliminate all usual abbreviations and other probable incorrect words --> Build a correlation model based on type of business to eliminate contextuality -->  Finally arrive at useful text and match with store front names.

 

 

  • Solution

Named Entity Recognition systems are used to identify specific entities in the text, such as people, places, or organizations. More often than not, though, these systems are constrained by ambiguity. A word can denote more than one meaning, so that ambiguity can occur when the system is not certain about the proper meaning for a particular context.

Strategies for Handling Ambiguity:

Contextual Analysis:  NER systems take into account the words around a potential ambiguous term to dissect what that term means. Consider the word "Orange", which could refer to either a fruit or a company providing logistics support. If it is surrounded with words like "Warehouse" or "inventory," it is more likely to be identified as the technology company.

Gazetteers: They are list of entities along with their types. If a word can be found in a gazetteer, then the system is more likely to identify it as the listed entity.


Machine Learning - Advanced NER makes use of machine learning algorithms to learn for large, labeled datasets of text. Machine learning identifies patterns and relationships that will allow the system to make better predictions.

Techniques for Improving Accuracy:

Quality of Training Data: Quality of the training data is critical. If the noisy and inconsistent data are fed to the system, it will most certainly produce incorrect results.

Feature engineering: building informative features can enable the system to have a better appreciation of the context in which a word is being used. As such, it could be essential to include features like whether it is part of speech, whether it has been capitalized, and distance from other entities.


Ensemble Methods: The accuracy of a number of multiple NER systems can be enhanced by combining these together. These different systems have their strengths and weaknesses, and by combining them, errors from individual systems are decreased.

Domain Knowledge: If the domain is medicine or law, then the addition of domain knowledge helps them to understand the nuances of language.

By employing these strategies and techniques, NER systems can become more accurate and reliable in real-world applications.

Named Entity Recognition (NER) - A technology used in the Natural Language Processing (NLP) to identify and classify entities in text. Say for example, we are saying that “Narendra Modi was born in Vadnagar” then NER should identify “Narendra Modi” as a person & “Vadnagar” as a location.

Above was a simple example but let’s add some ambiguity. Let’s say we are typing in that “Apple manufactures iPhone”. Here, NER should be able to identify “Apple” as name of the organization and not as fruit. Hence, NEP tools should have capability to identify entities like names of people, places, organizations, dates in right context even after ambiguity as mentioned in above example.

Now, logically thinking with human brain we can distinguish Apple as a fruit or Apple as a mobile manufacturing company by looking at the context in which the word is used. Let’s see how NER systems in Natural Language Processing deal and process when ambiguity:

 

1.      Through Contextual Analysis: In GPT-4 the NER technology uses contextual analysis using pre-trained language models like BERT or GPT-3 through which context is understood and accuracy is improved.

 

2.      Google Bard: Data augmentation & context-aware models.

 

3.      LLaMA: Heuristic rules

 

4.      Claude: Contextual embeddings, attention mechanisms, multi-task learning and knowledge graph

 

To summarize, the basis for all LLMs is contextual analysis, pre-trained models, heuristic rules & attention mechanisms.

While none of the answers are complete in the true sense, the closest that comes is from Sachin Tanwar and hence has been selected as the winner.

 

I recommend going through all answers to get a more complete perspective.

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.