The Importance of E-Vals in Product Development

The benchmarking stuff has been a little frustrating for us because people are gaming it in all sorts of ways, including training on it. For example, when the 3B that we trained, we did multiple ABA tests and we got a net improvement of 50% over like Salesforce code jam. That was way better. The delta between our sort of between cogent and our fine tuned model. And I'd be curious to see if one plays with the data mix, does that affect the completion rate? Because getting this really tough.

Play episode from 17:33

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app