NLP & Spacy

Getting started with Spacy in Python

Code for this video (URL)

Spacy linguistic features (URL)

A Code-First Introduction to Natural Language Processing

Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin

Stanford Stanza

Stanza provides pretrained NLP models for a total 66 human languages. Pretrained models in Stanza can be divided into two categories, based on the datasets they were trained on:

  1. Universal Dependencies (UD) models, which are trained on the UD treebanks, and cover functionalities including tokenization, multi-word token (MWT) expansion, lemmatization, part-of-speech (POS) and morphological features tagging and dependency parsing;
  2. NER models, which support named entity tagging for 8 languages, and are trained on various NER datasets.

Stanford CoreNLP – Natural language software

Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get the quotes people said, etc.

Core NLP server U

CoreNLP includes a simple web API server for servicing your human language understanding needs (starting with version 3.6.0). This page describes how to set it up. CoreNLP server provides both a convenient graphical way to interface with your installation of CoreNLP and an API with which to call CoreNLP using any programming language. If you’re writing a new wrapper of CoreNLP for using it in another language, you’re advised to do it using the CoreNLP Server.

The Natural Language Toolkit