Visualizing NLP with Pretrained Models – spaCy and StanfordNLP

Natural Language Processing (NLP) is a crucial aspect of understanding and processing human language using computational methods. In this tutorial, we will explore two popular NLP libraries – spaCy and StanfordNLP – and demonstrate their capabilities using pretrained models.

spaCy – English NLP

Let’s start with spaCy and an English example. We’ll use a snippet about Donald John Trump and visualize various linguistic features.

import spacy

# Load spaCy English model
en = spacy.load("en")

text = ("Donald John Trump (born June 14, 1946) is the 45th and current president of "
        "the United States. Before entering politics, he was a businessman and television personality.")

# Tokenize the text
doc_en = en(text)

# Display sentences and tokens
list(doc_en.sents)

The text is tokenized into sentences and individual tokens. Each token has attributes such as orth (original text), lemma, pos (part of speech), and tag.

from IPython.display import HTML, display
import tabulate

# Display tokens
tokens = [[token] for token in doc_en]
display(HTML(tabulate.tabulate(tokens, tablefmt='html')))

Named Entity Recognition (NER) with spaCy

spaCy provides pretrained models for named entity recognition. Let’s identify entities in our text.

pythonCopy code

# Identify named entities
entities = [(t.orth_, t.ent_iob_, t.ent_type_) for t in doc_en]
display(HTML(tabulate.tabulate(entities, tablefmt='html')))

Entities like “Donald John Trump,” “June 14, 1946,” “45th,” and “the United States” are recognized with their respective types (PERSON, DATE, ORDINAL, GPE).

Dependency Parsing with spaCy

The dependency parser in spaCy helps analyze grammatical relations between tokens.

# Dependency parsing
syntax = [[token.text, token.dep_, token.head.text] for token in doc_en]
display(HTML(tabulate.tabulate(syntax, tablefmt='html')))

This shows the grammatical relations between tokens, revealing the sentence’s structure.

StanfordNLP – Dutch NLP

Now, let’s switch to StanfordNLP and process a Dutch sentence about Charles Michel.

import stanfordnlp

# Download the Dutch model (if not already downloaded)
# stanfordnlp.download('nl')

# Load StanfordNLP Dutch model
nl_stanford = stanfordnlp.Pipeline(lang="nl")
text_nl = "Charles Michel is de eerste minister van België."
doc_nl_stanford = nl_stanford(text_nl)

Combining spaCy and StanfordNLP

You can combine the strengths of spaCy and StanfordNLP. The spacy_stanfordnlp wrapper allows you to integrate StanfordNLP into spaCy.

from spacy_stanfordnlp import StanfordNLPLanguage

# Create a combined pipeline
nl_combined = StanfordNLPLanguage(nl_stanford)
doc_nl_combined = nl_combined(text_nl)

# Display combined information
info = [(t.orth_, t.lemma_, t.pos_, t.tag_) for t in doc_nl_combined]
display(HTML(tabulate.tabulate(info, tablefmt='html')))

This combination provides Dutch lemmatization, part-of-speech tagging, and dependency parsing.

Enhancing with spaCy’s NER

You can extend the combined pipeline with spaCy’s Named Entity Recognition.

nl_combined = StanfordNLPLanguage(nl_stanford)
nl_ner = en.get_pipe("ner")
nl_combined.add_pipe(nl_ner)
nl_combined.vocab.strings.add("PER")

doc_nl_combined = nl_combined(text_nl)

# Display enhanced information
info = [(t.orth_, t.lemma_, t.pos_, t.tag_, t.ent_iob_, t.ent_type_) for t in doc_nl_combined]
display(HTML(tabulate.tabulate(info, tablefmt='html')))

This shows how you can leverage the strengths of both libraries for a more comprehensive NLP analysis.

Conclusion

In conclusion, spaCy and StanfordNLP offer powerful NLP capabilities with pretrained models for multiple languages. Combining their strengths can provide a more robust solution for various linguistic tasks. Explore further, experiment with different languages, and discover the possibilities these libraries offer for understanding and processing natural language.

Image Processing and Object Comparison using Python

ByKishore January 18, 2024May 27, 2024

Introduction: Image processing is a crucial aspect of computer vision and machine learning applications. In this tutorial, we’ll explore basic image manipulation techniques using Python libraries like PIL (Pillow), NumPy, and matplotlib. Additionally, we’ll delve into object comparison and similarity measurement. Setting Up the Environment: Before we start, ensure you have the required libraries installed….

Data Analytics | NLP

Sentiment Analysis: Unveiling the Power of Text Analysis

ByKishore March 14, 2024May 25, 2024

In the era of big data, understanding customer sentiment is crucial for businesses to make informed decisions. Sentiment analysis, also known as opinion mining, is a powerful technique that helps businesses extract valuable insights from text data. Whether it’s understanding customer feedback, monitoring social media chatter, or analyzing product reviews, sentiment analysis can provide invaluable…

Data Analytics | Machine Learning | NLP

Exploring Named Entity Recognition with Conditional Random Fields

ByKishore January 9, 2024January 10, 2024

Named Entity Recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities, such as names of people, organizations, and locations, within a text. NER plays a crucial role in various applications, including information retrieval, question answering, and text summarization. In this blog post, we’ll dive into the world of…

Machine Learning

Image Processing and Object Comparison using Python – Part 2

ByKishore January 18, 2024May 27, 2024

Image Comparison and Similarity Measurement Introduction: Welcome to the second part of our tutorial on Image Processing and Object Comparison using Python. In this section, we’ll delve into image comparison and explore techniques for measuring the similarity between two images. Understanding these methods is crucial for various applications, such as image retrieval, object recognition, and…

NLP

Unveiling the Power of Word Embeddings with Gensim

ByKishore January 11, 2024May 28, 2024

In the realm of Natural Language Processing (NLP), word embeddings have emerged as a game-changer. Unlike traditional approaches that use words as features, word embeddings leverage dense, low-dimensional vectors to capture the meaning and usage of a word. One pioneering model in this domain is Word2Vec, developed by Thomas Mikolov and team at Google. In…

NLP

Unraveling Text Classification: Traditional Approaches with Scikit-learn

ByKishore January 31, 2024May 26, 2024

Welcome to a journey into the world of text classification, where we’ll explore some traditional yet powerful approaches using Scikit-learn. While deep learning has taken center stage in Natural Language Processing (NLP), these classical methods remain quick and effective for training text classifiers. Our playground for this experiment is the 20 Newsgroups dataset, a classic…