Traditional methods of code retrieval, such as cosine similarity on embedded code queries, struggle due to the significant semantic differences between code and natural language, leading to low performance in identifying relevant snippets. To enhance retrieval accuracy, an initial approach involved training a model to generate hypothetical code snippets based on English queries and then embedding these for similarity checks. This method, although straightforward, improved performance substantially. The development further advanced with the introduction of a self-play mechanism in Genie, where the model learns directly from its training data how to effectively retrieve code, mimicking a developer's natural exploration methods. This process involves interacting with the file system, identifying candidate files, and using navigational techniques such as 'go to definition' and 'references' to traverse the codebase more intuitively. While Genie can still employ keyword and semantic searches along with graph analysis, its primary method involves leveraging learned experiences from the dataset to enhance code retrieval, making it more efficient and developer-like in its approach.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode