Latent Space: The AI Engineer Podcast cover image

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

Latent Space: The AI Engineer Podcast

00:00

The Importance of Data Pre-processing in Model Performance

Data pre-processing is crucial for improving model performance as it directly impacts the quality of input data, influencing the output. In the context of code rag, utilizing pre-processing to extract key components of code opens up more possibilities for system tuning. The structural understanding of source code by the compiler is a valuable resource that should not be disregarded in code generation systems, despite advancements in data-driven models. The data pre-processing engine serves to dissect the entire code base into small semantic units, providing a robust solution without compromising customer data privacy. Protecting proprietary IP is a genuine concern, as unauthorized access to training sets can lead to extraction and exploitation of institutional knowledge.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app