The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Evaluating Language Models in Cyber Threat Intelligence

This chapter explores the evaluation process of various language models in the context of cyber threat intelligence tasks. It highlights the performance of models like ChatGPT-4 and LAMA while emphasizing challenges faced by all models, such as misattributions and hallucinations. The discussion includes the importance of benchmarks and structured evaluation tasks to enhance accuracy and inform analysts about potential model limitations.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app