
Reoffending rates, Welsh taxes and the menopause
More or Less: Behind the Stats
00:00
Corpus Linguistics - What Do You Find?
Corpus linguistics is a methodology that uses large electronic collections of what we call naturally occurring language data. The kind of really big modern corpora tend to be running into the billions, so the really big ones that we can use do tend to be from the web here. So I mean across the different corpora that I looked at is around 80% data is versus 20% data are, and that seems to be replicated across different verbs as well. And you can figure out how often people say data is versus how often they say data are by looking at their comments on social media sites such as Twitter or Facebook. But there does definitely take a plural verb much more often in genres like academia
Transcript
Play full episode