
03: The Next Generation of LLMs with Jonathan Frankle of MosaicML
Replit AI Podcast
00:00
The Importance of E-Vals in Product Development
The benchmarking stuff has been a little frustrating for us because people are gaming it in all sorts of ways, including training on it. For example, when the 3B that we trained, we did multiple ABA tests and we got a net improvement of 50% over like Salesforce code jam. That was way better. The delta between our sort of between cogent and our fine tuned model. And I'd be curious to see if one plays with the data mix, does that affect the completion rate? Because getting this really tough.
Transcript
Play full episode