2min chapter

Weaviate Podcast cover image

Unstructured with Brian Raymond - Weaviate Podcast #48!

Weaviate Podcast

CHAPTER

How to Clean a Natural Language Data File

The first thing you want to do is partition and extract the natural language data from a particular file. And so as a data scientist, your job's still not, you're not ready yet. You got to stage it. There's all sorts of artifacts that wind their way in here. So like sentence fragments and weird white spaces and characters and all sorts of just like little gremlins that just burn up time on cleaning that up. We spend a lot of time thinking about that cleaning step.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode