
635: The Perils of Manually Labeling Data for Machine Learning Models
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Why Not Just Use Elasticsearch?
We believe that all these problems share the same relative footprint, where I have bytes that are laid out in some way that makes sense to the machine and to the users. We had to constrain the grammar that people were able to use to query data so that we could then simulate it. So what I can do is I can actually take the letters ABC and turn it into a finite state machine. And you could either access the data directly, or you could go through the nodes of this graph which show everything that has a greater than 90% likelihood of being labeled over an extended period of time. Like a probabilistic model for predicting how much money will be lost if someone loses their job
Transcript
Play full episode