Latent Space: The AI Engineer Podcast cover image

Is finetuning GPT4o worth it? — with Alistair Pullen, Cosine (Genie)

Latent Space: The AI Engineer Podcast

00:00

Approach to Advanced Code Retrieval

Traditional methods of code retrieval, such as cosine similarity on embedded code queries, struggle due to the significant semantic differences between code and natural language, leading to low performance in identifying relevant snippets. To enhance retrieval accuracy, an initial approach involved training a model to generate hypothetical code snippets based on English queries and then embedding these for similarity checks. This method, although straightforward, improved performance substantially. The development further advanced with the introduction of a self-play mechanism in Genie, where the model learns directly from its training data how to effectively retrieve code, mimicking a developer's natural exploration methods. This process involves interacting with the file system, identifying candidate files, and using navigational techniques such as 'go to definition' and 'references' to traverse the codebase more intuitively. While Genie can still employ keyword and semantic searches along with graph analysis, its primary method involves leveraging learned experiences from the dataset to enhance code retrieval, making it more efficient and developer-like in its approach.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app