What is it about computational communication science?

Observing Opinions: What is Pre-Processing?

Sep 9, 2025

19:10

forum

Ask episode

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

In this episode, Prof. Jamal Abdul Nasir from the University of Galway reveals why pre-processing is the backbone of all text analysis. He breaks down key steps like defining documents, tokenization, removing stop words, unification, and stemming vs. lemmatization. Jamal also explains unigrams vs. bigrams and how modern NLP techniques like byte-pair encoding are changing the game. Plus, he shares practical tips for making your pre-processing transparent and reproducible, helping your research stand strong and scale up.

Home Top podcasts Popular guests Top books