AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Role of LLMs in Casual Reasoning
The paper says LLMs are good tools for complementing human causality. They apparently do well on all these benchmarks, but you were still left with questions like what did they do well at? What is it about those questions that you feel like you would need access to the model weights or training data sets or other things to really suss out? Well, okay. For one thing, we saw that we didn't really see performative results with respect to these benchmarks with respect to state of the art baselines until we got to a certain level of model size,. So essentially text eventually three GPT 3.5 turbo GPT four, when we started seeing comparative or better than baseline results