We recently attended the all remote Interspeech 2020. Each of us made notes on what they did overall. But instead of posting those or going with a what-we-think-about-the-conference style post, we thought to just ask the team what interested them the most in a few sentences. Here are the individual responses:
I really liked the ZeroSpeech challenge. As usual, they had few really interesting unsupervised problems and solutions. I find their tracks really ambitious as evident by this statement on their website “… infants learn to speak their native language, spontaneously, from raw sensory input, without supervision from text or linguists. It should be possible to do the same in machines”.
Next year, 2021, the target is Spoken Language Modeling. Looking forward to that too.
The Meta Learning Tutorial on day 1 was a detailed session on the topic. The promise of performing well on a set of task(s) with less amount of data had my attention. The authors take care of introduction, utility and comparison of this approach and its impact on tasks like speaker verification, keyword spotting, Emotion Recognition and my special interest conversational AI.
A new thing here was Computational Paralinguistics, that covers the non-content parts of speech. Given my interest in the stylistic parts of speech, this was particularly interesting. Papers presented many ideas relevant to building a better voicebot, like - uncertainty aware methods for multiple labels, Autism Quotient as a perception feature, and predicting CSAT scores from sentiment.
Interspeech had some great sessions, from discussions on more fundamental concepts related to Speech Processing in the Brain, Phonetics and Phonology to novel ideas on Training Strategies for ASR like Semantic Word Masking, Efficient Vocoder implementations for faster Neural Waveform Synthesis and Automatic Prosody Analysis for Non-Semantic Speech Representations. It helped me connect alot of dots and exposed me to some great ideas we should be exploring in our work.
I really liked attending the Speech Emotion Recognition tracks. The tracks covered a multitude of topics including self-supervised learning methods, non-semantic representations, etc. It was overall, a very balanced track with a lot of interaction among the attendees and the presenters. The speech signal representation track was pretty fun too with some really interesting papers on voice casting, universal non-semantic representations.