Lingthusiasm - A podcast that's enthusiastic about linguistics

98: Helping computers decode sentences - Interview with Emily M. Bender

Nov 22, 2024

In this engaging conversation, Dr. Emily M. Bender, a Linguistics Professor at the University of Washington and co-host of Mystery AI Hype Theater 3000, explores how computers decode language. She reveals the difference between human and computer language learning, discussing the humorous failures that can arise. Emily emphasizes the significance of community over mere data, the ethical implications of AI training, and the creation of the Bender Rule, which promotes clarity in language use. Expect intriguing insights into the quirks of AI behavior and the future of linguistic technology!

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

From Startup Job To Grammar Matrix

Emily Bender described building a Japanese grammar at a startup to automate customer service responses.
That work led her to create the Grammar Matrix starter kit to help build grammars for many languages.

INSIGHT

Language Is Many Separate Problems

Solving “language” is vague because language involves many distinct skills like sound, syntax, semantics, and pragmatics.
True human-like understanding requires far more than linguistics alone, including multimodal and pragmatic reasoning.

ADVICE

Work With Communities First

Engage with communities first and ask what computational tools they want before building systems for their languages.
Train local community members as computational linguists rather than extracting their data without consent.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

When a human learns a new word, we're learning to attach that word to a set of concepts in the real world. When a computer "learns" a new word, it is creating some associations between that word and other words it has seen before, which can sometimes give it the appearance of understanding, but it doesn't have that real-world grounding, which can sometimes lead to spectacular failures: hilariously implausible from a human perspective, just as plausible from the computer's. In this episode, your host Lauren Gawne gets enthusiastic about how computers process language with Dr. Emily M. Bender, who is a linguistics professor at the University of Washington, USA, and cohost of the podcast Mystery AI Hype Theater 3000. We talk about Emily's work trying to formulate a list of rules that a computer can use to generate grammatical sentences in a language, the differences between that and training a computer to generate sentences using the statistical likelihood of what comes next based on all the other sentences, and the further differences between both those things and how humans map language onto the real world. We also talk about paying attention to communities not just data, the labour practices behind large language models, and how Emily's persistent questions led to the creation of the Bender Rule (always state the language you're working on, even if it's English). Click here for a link to this episode in your podcast player of choice: episodes.fm/1186056137/episode/dGFnOnNvdW5kY2xvdWQsMjAxMDp0cmFja3MvMTk2NDIxOTY5OQ Read the transcript here: lingthusiasm.com/post/767803835730231296/transcript-episode-98 Announcements: The 2024 Lingthusiasm Listener Survey is here! It’s a mix of questions about who you are as our listener, as well as some fun linguistics experiments for you to participate in. If you have taken the survey in previous years, there are new questions, so you can participate again this year. Take the survey here: bit.ly/lingthusiasmsurvey24 In this month’s bonus episode we get enthusiastic about three places where we can learn things about linguistics!! We talk about two linguistically interesting museums that Gretchen recently visited: the Estonian National Museum, as well as Mundolingua, a general linguistics museum in Paris. We also talk about Lauren's dream linguistics travel destination: Martha's Vineyard. Join us on Patreon now to get access to this and 90+ other bonus episodes. You’ll also get access to the Lingthusiasm Discord server where you can chat with other language nerds. Sign up here: patreon.com/posts/115117867 Also, Patreon now has gift memberships! If you'd like to get a gift subscription to Lingthusiasm bonus episodes for someone you know, or if you want to suggest them as a gift for yourself, here's how to gift a membership: patreon.com/lingthusiasm/gift For links to things mentioned in this episode: lingthusiasm.com/post/767803572750581760/lingthusiasm-episode-98-helping-computers-decode