Advancements in AI Evaluation and Research Developments

This chapter explores the release of GSM-1K, a dataset for evaluating data contamination in AI benchmarks, alongside the introduction of PlanSearch for inference time compute. It also announces an upcoming evaluation titled 'humanity's last exam', setting the stage for more interactive assessments in the future.

Play episode from 28:09

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Hi everyone!

If you’re a new subscriber or listener, welcome.

If you’re not new, you’ve probably noticed that things have slowed down from us a bit recently. Hugh Zhang, Andrey Kurenkov and I sat down to recap some of The Gradient’s history, where we are now, and how things will look going forward.

To summarize and give some context:

The Gradient has been around for around 6 years now – we began as an online magazine, and began producing our own newsletter and podcast about 4 years ago. With a team of volunteers — we take in a bit of money through Substack that we use for subscriptions to tools we need and try to pay ourselves a bit — we’ve been able to keep this going for quite some time.

Our team has less bandwidth than we’d like right now (and I’ll admit that at least some of us are running on fumes…) — we’ll be making a few changes:

* Magazine: We’re going to be scaling down our editing work on the magazine. While we won’t be accepting pitches for unwritten drafts for now, if you have a full piece that you’d like to pitch to us, we’ll consider posting it. If you’ve reached out about writing and haven’t heard from us, we’re really sorry. We’ve tried a few different arrangements to manage the pipeline of articles we have, but it’s been difficult to make it work. We still want this to be a place to promote good work and writing from the ML community, so we intend to continue using this Substack for that purpose. If we have more editing bandwidth on our team in the future, we want to continue doing that work.

* Newsletter: We’ll aim to continue the newsletter as before, but with a “Best from the Community” section highlighting posts. We’ll have a way for you to send articles you want to be featured, but for now you can reach us at our editor@thegradient.pub.

* Podcast: I’ll be continuing this (at a slower pace), but eventually transition it away from The Gradient given the expanded range. If you’re interested in following, it might be worth subscribing on another player like Apple Podcasts, Spotify, or using the RSS feed.

* Sigmoid Social: We’ll keep this alive as long as there’s financial support for it.

If you like what we do and/or want to help us out in any way, do reach out to editor@thegradient.pub. We love hearing from you.

Timestamps

* (0:00) Intro

* (01:55) How The Gradient began

* (03:23) Changes and announcements

* (10:10) More Gradient history! On our involvement, favorite articles, and some plugs

Some of our favorite articles!

There are so many, so this is very much a non-exhaustive list:

* NLP’s ImageNet moment has arrived

* The State of Machine Learning Frameworks in 2019

* Why transformative artificial intelligence is really, really hard to achieve