
Bits & Atomen Wanneer spreekt AI Vlaams? (Live op de Dag van de Wetenschap)
9 snips
Nov 28, 2025 Annelies Duerinckx, a researcher with Scivil, discusses the Mar-Alee project aimed at improving AI recognition of Flemish dialects by collecting diverse speech data. Linguist Melissa Schuring shares insights on the challenges and ethics of recording children's speech for research. Movement scientist Jelle Habay addresses transparency in scientific practices, recounting his own experimental failures and the importance of publishing both successes and setbacks. The conversation highlights the need for inclusive data to enhance language technology and the complexities of scientific research.
AI Snips
Chapters
Transcript
Episode notes
Why Flemish AI Struggles
- Large speech models fail on Flemish because training data is dominated by Netherlands Dutch and broadcast speech.
- Collecting diverse, spontaneous Flemish voice data improves recognition for regional dialects and youth language.
Collect Spontaneous Speech, Not Scripts
- Do record spontaneous speech in the Maralee app rather than read scripts to capture natural dialectal variation.
- Ask open questions so contributors speak freely about events like 'what did you see today?' to get varied data.
Kids, Mics, And Toilet Surprises
- Melissa Schuring shared that recording children brings unexpected issues like microphones left on in toilets.
- About 10% of some session recordings contained toilet noises and had to be manually removed for ethical reasons.
