
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
00:00
Evaluating Language Models in Cyber Threat Intelligence
This chapter explores the evaluation process of various language models in the context of cyber threat intelligence tasks. It highlights the performance of models like ChatGPT-4 and LAMA while emphasizing challenges faced by all models, such as misattributions and hallucinations. The discussion includes the importance of benchmarks and structured evaluation tasks to enhance accuracy and inform analysts about potential model limitations.
Transcript
Play full episode