AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of E-Vals in Product Development
The benchmarking stuff has been a little frustrating for us because people are gaming it in all sorts of ways, including training on it. For example, when the 3B that we trained, we did multiple ABA tests and we got a net improvement of 50% over like Salesforce code jam. That was way better. The delta between our sort of between cogent and our fine tuned model. And I'd be curious to see if one plays with the data mix, does that affect the completion rate? Because getting this really tough.