An overview on Natural Language Processing

Rutuja Wanjari
7 min readAug 8, 2018

--

Natural Language Processing (NLP) is a field of computer science that deals with applying linguistic and statistical algorithms to text in order to extract meaning in a way that is very similar to how the human brain understands language.

To learn NLP, I would strongly recommend you to just relax and take chill pill !! Cause learning is lot more easier when you are more relaxed and also you can focus on a certain area if you like.

So what exactly is Natural Language Processing ?

NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

“Apart from common word processor operations that treat text like a mere sequence of symbols, NLP considers the hierarchical structure of language: several words make a phrase, several phrases make a sentence and, ultimately, sentences convey ideas,” John Rehling, an NLP expert at Meltwater Group, said in How Natural Language Processing Helps Uncover Social Media Sentiment. “By analyzing language for its meaning, NLP systems have long filled useful roles, such as correcting grammar, converting speech to text and automatically translating between languages.”

NLP is commonly used for text mining, machine translation, and automated question answering.

Why is NLP important?

Large volumes of textual data

Natural language processing helps computers communicate with humans in their own language and scales other language-related tasks. For example, NLP makes it possible for computers to read text, hear speech, interpret it, measure sentiment and determine which parts are important.

Today’s machines can analyze more language-based data than humans, without fatigue and in a consistent, unbiased way. Considering the staggering amount of unstructured data that’s generated every day, from medical records to social media, automation will be critical to fully analyze text and speech data efficiently.

Structuring a highly unstructured data source

Human language is astoundingly complex and diverse. We express ourselves in infinite ways, both verbally and in writing. Not only are there hundreds of languages and dialects, but within each language is a unique set of grammar and syntax rules, terms and slang. When we write, we often misspell or abbreviate words, or omit punctuation. When we speak, we have regional accents, and we mumble, stutter and borrow terms from other languages.

While supervised and unsupervised learning, and specifically deep learning, are now widely used for modeling human language, there’s also a need for syntactic and semantic understanding and domain expertise that are not necessarily present in these machine learning approaches. NLP is important because it helps resolve ambiguity in language and adds useful numeric structure to the data for many downstream applications, such as speech recognition or text analytics.

Steps in Natural Language Processing of a text

1. Lexical Analysis

It involves identifying and analyzing the structure of words. Lexicon of a language means the collection of words and phrases in a language. Lexical analysis is dividing the whole chunk of text into paragraphs, sentences, and words.

2. Syntactic Analysis

It involves analysis of words in the sentence for grammar and arranging words in a manner that shows the relationship among the words. The sentence such as “The school goes to boy” is rejected by English syntactic analyser.

3. Semantic Analysis

It draws the exact meaning or the dictionary meaning from the text. The text is checked for meaningfulness. It is done by mapping syntactic structures and objects in the task domain. The semantic analyser disregards sentence such as “hot ice-cream”.

4. Disclosure Integration

The meaning of any sentence depends upon the meaning of the sentence just before it. In addition, it also brings about the meaning of immediately succeeding sentence.

5. Pragmatic Analysis

During this, what was said is re-interpreted on what it actually meant. It involves deriving those aspects of language which require real world knowledge.

Business applications for natural language processing

1. Neural machine translation

Natural language processing software learns language in the way a person does, think of early MT as a toddler. Over time, more words get added to an engine, and soon there’s a teenager who won’t shut up. Machine translation quality is inherently dependent on the number of words you give it, which takes time and originally made MT hard to scale.

Fortunately, for businesses that don’t want to wait for an engine to “grow up,” there’s neural machine translation. In 2016, Microsoft’s Bing Translator became first to launch the tech. Google Translate and Amazon Translate now offer competing systems. Before neural, machine translation engines operated in only one direction — say, Spanish into English. If you wanted to translate from English into Spanish, you had to start over with a different data set. And if you wanted to add a third language, well, that was crazy. But with neural machine translation, engineers can cross-apply data. This radically speeds up development, taking a machine translation engine from zero to amazing in months vs. years. As a result, businesses can safely use MT to translate low-impact content: product reviews, regulatory docs that no one reads, email.

2. Text Analysis Platform

Text Analysis (Text mining) is the process of exploring and analyzing large amounts of unstructured text data aided by software that can identify concepts, patterns, topics, keywords and other attributes in the data. It’s also known as text analytics, although some people draw a distinction between the two terms; in that view, text analytics is an application enabled by the use of text mining techniques to sort through data sets.

Through techniques such as categorization, entity extraction, sentiment analysis and others, text mining extracts the useful information and knowledge hidden in text content. In the business world, this translates in being able to reveal insights, patterns and trends in even large volumes of unstructured data. In fact, it’s this ability to push aside all of the non-relevant material and provide answers that is leading to its rapid adoption, especially in large organizations.

3. Chatbots

If machine translation is one of the oldest natural language processing examples, chatbots are the newest. Bots streamline functionality by integrating in programs like Slack, Skype, and Microsoft Teams. When they first came on the scene, chatbots were consumer-facing. For example, if you typed “pizza” into Facebook Messenger, a Domino’s bot would ask to take your order. While touch points like these can help drive B2C sales, in a B2B world no one wants purchasing reminders interrupting them in Slack.

4. Hiring tools

On the topic of HR, natural language processing software has long helped hiring managers sort through resumes. Using the same techniques as Google search, automated candidate sourcing tools scan applicant CVs to pinpoint people with the required background for a job. But — like early machine translation — the sorting algorithms these platforms used made a lot of mistakes. Say an applicant called herself a “business growth brainstormer” instead of an “outside sales rep”: Her resume wouldn’t show in results and your company would overlook a creative, client-driven candidate.

Today’s systems move beyond exact keyword match. Scout, for example, addresses the synonym issue by searching for HR’s originally provided keywords, then using results to identify new words to look for. Extrapolating new terms (like “business growth”) keeps qualified candidates from slipping between the cracks. And since women and minorities use language differently, the process makes sure they don’t either.

5. Conversational search

Like Talla, Second Mind wants to answer all your employees’ questions. But this tool isn’t a bot: It’s a voice-activated platform that listens in on company meetings for trigger phrases like “what are” and “I wonder.” When it hears them, Second Mind’s search function whirs into action, seeking an answer for the rest of your sentence.

Say, for example, you’re in a board meeting and someone says, “What was the ROI on that last year?” Silently, Second Mind would scan company financials — or whatever else they asked about — then display results on a screen in the room. Founder Kul Singh says the average employee spends 30 percent of the day searching for information, costing companies up to $14,209 per person per year. By streamlining search in real-time conversation, Second Mind promises to improve productivity.

Recommended NLP Books for Beginners

  • Speech and Language Processing: “The first of its kind to thoroughly cover language technology — at all levels and with all modern technologies — this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations.”
  • Foundations of Statistical Natural Language Processing: “This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.”
  • Handbook of Natural Language Processing: “The Second Edition presents practical tools and techniques for implementing natural language processing in computer systems. Along with removing outdated material, this edition updates every chapter and expands the content to include emerging areas, such as sentiment analysis.”
  • Statistical Language Learning (Language, Speech, and Communication): “Eugene Charniak breaks new ground in artificial intelligence research by presenting statistical language processing from an artificial intelligence point of view in a text for researchers and scientists with a traditional computer science background.”
  • Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. “This is a book about Natural Language Processing. By “natural language” we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. At one extreme, it could be as simple as counting word frequencies to compare different writing styles.
  • Speech and Language Processing, 2nd Edition: “An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology — at all levels and with all modern technologies — this text takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations.”
  • Introduction to Information Retrieval: “As recently as the 1990s, studies showed that most people preferred getting information from other people rather than from information retrieval systems. However, during the last decade, relentless optimization of information retrieval effectiveness has driven web search engines to new quality levels where most people are satisfied most of the time, and web search has become a standard and often preferred source of information finding.”

--

--

Rutuja Wanjari
Rutuja Wanjari

Written by Rutuja Wanjari

Chatbot | AI | NLP | MachineLearning

No responses yet