

Modern NLP with spaCy
Dec 9, 2019
Ines Montani and Matthew Honnibal, core developers of the powerful NLP library spaCy and co-founders of Explosion AI, dive into the fascinating world of natural language processing. They share SpaCy's journey, highlighting its adoption and unique features. The duo discusses the critical role of data annotation and efficient tooling in machine learning, while demystifying complex NLP models using a four-step understanding framework. They also touch on practical workflows, ethical considerations, and the vibrant community surrounding NLP.
AI Snips
Chapters
Books
Transcript
Episode notes
SpaCy's Origin
- Matthew Honnibal started his PhD in 2005 and worked on NLP research.
- Seeing companies struggle with his research code, he aimed to create more production-ready tools.
SpaCy's Name
- SpaCy's name originated from its initial focus on tokenization, splitting text based on spaces.
- The name also reflects its speed and use of Cython.
Tokenization Challenges
- Tokenization in NLP involves splitting text into words, but it becomes complex with punctuation and contractions.
- Different languages have different definitions of tokens; a token isn't always a word.