
Getting to the top of the GPT Store (and building an AI-native search engine, too)
Artificial Ignorance
Introduction
This chapter delves into the founding story of Consensus AI, a search engine for scientific papers, by two college football teammates who aimed to leverage AI to enhance accessibility to scientific knowledge. It explores their journey from ideation to the public launch of the AI-powered search engine.
Back in January, I wrote about what business models might make sense with OpenAI’s GPT Store:
As part of that, I included a screenshot of the featured GPTs, which included Consensus, the number one “Research & Analysis” GPT since day one. That said, the company existed well before the GPT Store - they’ve been working for years on an AI-native search engine for research papers. They’re also a part of the latest AI Grant cohort.
So, I was pleasantly surprised to learn that one of Consensus's founders, Christian Salem, was a reader! I was equally impressed with our conversation - while I’ve included some key takeaways below, it’s a pretty wide-ranging interview covering RAG and search engine architecture, the value of the GPT Store, how the NFL thinks about product management (and might use AI in the future), and more.
Three key takeaways
Vector search is not a silver bullet. Rather, it’s another tool that engineers can use as they build search infrastructure. It enables some new capabilities, but it also comes with tradeoffs.
Search engines are so much more than just semantic similarity. The vector databases, they're amazing - some of these encoding models are getting better and better. But there's so much that goes into finding relevant documents that isn't just about the distance between two points in vector space. There's things like phrase matching, applying filters, finding synonyms. Sometimes we suggest queries to users. There's fuzzy matching - if you screw up and did a typo, it's kind of weird, but a lot of the vector and encoding models are not as good at typos and fuzzy matching can play tricks on them.
There’s real value in the GPT Store. I was surprised to discover that the GPT Store has driven many new users for Consensus, and they currently have ~50% higher retention than other channels. That said, Consensus can capture some value by converting users to paid subscriptions, which isn’t true of free services.
So actually we get some of our best users from ChatGPT, which is not something that I would have predicted when we set out doing this. But I think the last time I looked, our day 30 retention is, I want to say like 50 percent higher for users who we acquired through ChatGPT than every other channel. … It's actually turned into awesome users for us who not only use us in ChatGPT, but then come back to our website and use the web application day after day, and many of those users have converted to our premium subscription. So, so far it's resulted in a ton of value.
LLMs are not one size fits all. Christian’s comments on using multiple models were similar to the ones from my interview with Andrew Lee - to build a fast, efficient system, you’ll likely need different LLMs for different use cases. There are so many tradeoffs around speed, cost, and quality that it’s hard for one model to win at all three.
We were just counting this out the other day, 15 features powered by LLMs are in the product. Only three of them use OpenAI models. The rest all use open source models that we hand fine tune.
…
When we're assessing which models to use for a new feature or a new task within the product, I think there's a few really important criteria to go through. One, how similar is the task to OpenAI training data? If the task is, “hey, take a bunch of text and summarize it,” GPT-4 is so good at that task. It's seen that over and over and over again. So many users have asked it to do that in ChatGPT. And so they have RLHF on that. And for like a super basic summarization task that is very similar to some core GPT behavior.
…
Another thing that you obviously have to look at is cost. So, when you ask a question in Consensus for the top 20 papers that we return, we always do a RAG answer from the paper relevant to your question. That is very similar to something that GPT-4 could probably do pretty well. However, it would not be, economical to make 20 OpenAI calls on every single search for both premium and free users.
And three things you might not know:
* There still isn’t a great way for LLMs to parse structured PDF data, especially in table format. In the case of Consensus, it’s still a human-powered task.
* The GPT Store has started testing monetization with a handful of US-based creators but is not yet broadly available.
* While the NFL may look like any other company, its "shareholders" are the 32 team owners, meaning new product launches often have to get the approval of owners and their friends and family.
Artificial Ignorance is reader-supported. If you found this interesting or insightful, consider becoming a free or paid subscriber.
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.ignorance.ai/subscribe


