Weaviate Podcast cover image

Unstructured with Brian Raymond - Weaviate Podcast #48!

Weaviate Podcast

CHAPTER

How to Clean a Natural Language Data File

The first thing you want to do is partition and extract the natural language data from a particular file. And so as a data scientist, your job's still not, you're not ready yet. You got to stage it. There's all sorts of artifacts that wind their way in here. So like sentence fragments and weird white spaces and characters and all sorts of just like little gremlins that just burn up time on cleaning that up. We spend a lot of time thinking about that cleaning step.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner