Natural Language Processing, 2nd Edition

Video description

5 Hours of Video Instruction

Overview

Natural Language Processing LiveLessons covers thefundamentals of Natural Language Processing in a simple and intuitive way,empowering you to add NLP to your toolkit. Using the powerful NLTK package, itgradually moves from the basics of text representation, cleaning, topicdetection, regular expressions, and sentiment analysis before moving on to theKeras deep learning framework to explore more advanced topics such as textclassification and sequence-to-sequence models. After successfully completingthese lessons you’ll be equipped with a fundamental and practical understandingof state-of-the-art Natural Language Processing tools and algorithms.

Jupyter Notebook
Jupyter Notebook

About the Instructor

Bruno Goncalves is a senior data scientist working at the intersection of data science and finance who has been programming in Python since 2005. For the past 10 years, his work has focused on NLP, computational linguistics applications, and social networks.

Skill Level
  • Intermediate
Learn How To
  • Represent text
  • Clean text
  • Understand named entity recognition
  • Model topics
  • Conduct sentiment analysis
  • Utilize text classification
  • Understand word2vec word embeddings
  • Define GloVe
  • Transfer learning
  • Apply language detection
Who Should Take This Course
  • Data scientists with an interest in natural language processing
Course Requirements
  • Basic algebra, calculus, and statistics, plus programming experience
Lesson Descriptions

Lesson 1, Text Representations: The first step in any NLP application is the tokenization and representation of text through one-hot encodings and bag of words. Naturally, not all words are meaningful, so the next step is to remove meaningless stopwords and identify the most relevant words for your application using TF-IDF. The next step is to identify n-grams. Finally, you learn how word embeddings can be used as semantically meaningful representations and finalize things with a practical demo.

Lesson 2, Text Cleaning: Lesson 2 builds on the text representations of Lesson 1 by applying stemming and lemmatization to identify the roots of words and reduce the size of the vocabulary. Next comes deploying regular expressions to identify words fitting specific patterns. The lesson finishes up by demoing these techniques.

Lesson 3, Named Entity Recognition: In named entity recognition you develop approaches to tag words by the part of speech to which they correspond. You also identify meaningful groups of words by chunking and chinking before recognizing the named entities that are the subject of your text. The lesson ends with a demonstration of the entire pipeline from raw text to named entities.

Lesson 4, Topic Modeling: Lesson 4 is about developing ways of identifying what the main subject or subjects of a text are. It begins by exploring explicit semantic analysis to find documents mentioning a specific topic and then turns to clustering documents according to topics. Latent semantic analysis provides yet another powerful way to extract meaning from raw text, as does latent-Dirichlet allocation. Non-negative matrix factorization enables you to identify latent dimensions in the text and perform recommendations and measure similarities. Finally, a hands-on demo guides you through the process of using all of these techniques.

Lesson 5, Sentiment Analysis: After identifying the topics covered in a document, the next place to go is how you extract sentiment information. In other words, what kind of sentiments are being expressed? Are the words used positive or negative? The next step is to consider how to handle negations and modifiers and use corpus-based approaches to define the valence of each word as demonstrated in the lesson-ending demo.

Lesson 6, Text Classification: In this lesson you learn how to use feed forward networks and convolutional neural networks to classify the sentiment of movie reviews as a test case for how to deploy machine learning approaches in the context of NLP. It also discusses further applications of this approach before proceeding with a hands-on demo.

Lesson 7, Sequence Modelling: Lesson 7 builds on the foundations laid in the previous lesson to explore the use of recurrent neural network architectures for text classification. It starts with the basic RNN architecture before moving on to gated recurrent units and long short-term memory. It also includes a discussion of auto-encoder models and text generation. The lesson wraps up with the demo.

Lesson 8, Applications: This course has focused on some fundamental and not-so-fundamental tools of natural language processing. This final lesson considers specific applications and advanced topics. Perhaps one of the most important developments in NLP in recent years is the popularization of word embeddings in general and word2vec in particular. This enables you to delve deeper into vector representations of words and concepts and how semantic relations can be expressed through vector algebra. GloVe is the main competitor to word2vec, so this lesson also explores its advantages and disadvantages. Also discussed are the potential applications of transfer learning to NLP and the question of language detection. The lesson finishes with a demo.

About Pearson Video Training

Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Sams, and Que. Topics include IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.

Product information

  • Title: Natural Language Processing, 2nd Edition
  • Author(s): Bruno Goncalves
  • Release date: October 2021
  • Publisher(s): Pearson
  • ISBN: 0137670222