November 11th
Juan Godino-Llorente, Universidad Politécnica de Madrid, Spain.
Title of the talk: Past and present (and future) of speech technologies for the screening of Parkinson’s disease
Abstract:
Parkinson’s disease (PD), is a progressive neurological disorder that significantly impacts motor functions, including the speech, leading to dysarthric patterns often characterized by reduced loudness, monotony, imprecise articulation, and breathy or hoarse vocal quality. This fact has guided, in the last decade, the development of promising approaches aiming an early diagnosis and an objective monitoring of PD by leveraging advanced machine learning and signal processing algorithms to analyze the acoustic trace. To this respect, early research focused on analyzing fundamental speech characteristics, such as pitch, loudness, and articulation precision, as initial indicators of PD, with traditional methods relying on statistical analysis of these features. The need for efficient, non-invasive diagnostics spurred interest in automating speech analysis, though early systems often struggled with accuracy and reliability due to limited datasets and less refined feature extraction techniques. In recent years, however, the field has advanced with the application of deep learning and neural network models, allowing for more sophisticated feature extraction and improved classification performance. Contemporary approaches utilize complex algorithms, and cross-linguistic studies to capture subtle speech production anomalies linked to PD. These systems now show promise in detecting early-stage PD and monitoring disease progression, even in cross-lingual scenarios, providing valuable support to clinical assessments. But, despite these advances, current techniques are still limited and require a significant additional effort to translate them to the clinical setting.
In this context, this talk will review the past and present of some of the main contributions to the field developed during the last decade by Universidad Politécnica de Madrid, Spain, also presenting some ideas for the next future.
Bio: Juan I. Godino-Llorente was born in Madrid, Spain, in 1969. He received the B.Sc. and M.Sc. degrees in Telecommunications Eng., and the PhD. degree in Computer Science in 1992, 1996 and 2002, respectively, all from Universidad Politécnica de Madrid (UPM), Spain. From 1996 to 2003 he was with the UPM as Ass. Professor at the Circuits and Systems Eng. Dept. From 2003 to 2005 he joined the Signal Theory and Communications Dept. at the University of Alcala. From 2005, he joined again UPM, being the Head of the Circuits and Systems Eng. Dept. from 2006 till 2010. Since 2011 he is Full Professor in the field of Signal Theory and Communications. In 2006, he won the associate professor position after a national qualifying competitive call with 130 candidates, in which was ranked 1st.
During the academic term 2003-2004, he was a Visiting Professor at Salford University, Manchester, UK; and in 2016, he has been a Visiting Researcher at the Massachusetts Institute of Technology, USA funded by a Fulbright grant. He has served as editor for the IEEE Journal of Selected Topics in Signal Processing, for the IEEE Trans. on Audio, Speech and Language Processing, for the Speech Communication Journal, and for the EURASIP Journal of Advances in Signal Processing; and has also been a member of the scientific committee of INTERSPEECH, IEEE ICASSP, EUSIPCO, BIOSIGNALS, and other top ranked events for more than 10 years. He has participated as invited speaker in several international advanced schools, and has delivered invited speeches at different universities and events, including Harvard University, Johns Hopkins University, Tampere University and National University of Colombia.
He has chaired the 3rd Advanced Voice Function Assessment Workshop, and of the 1st and 2nd Automatic Assessment of Parkinsonian Speech Workshop. Likewise, since 2004, he is part of different panels of experts of the European Commission, and has been national coordinator of COST Action 2103, funded by the European Science Foundation. He is also expert evaluator of research proposals for the Spanish, Portuguese, Latvian, Polish, Israeli, Czech, Icelandic, Romanian, Belgian, and Norwegian research agencies.
He has published more than 70 papers in international journals included in the Science Citation Index and more than 50 in top ranked conferences. The international impact of his research activities is supported by the large number of publications in international journals, which have attracted more than 5500 citations (h-index=40). He has also led 10 competitive projects and 12 research projects financed by companies and public institutions. In total, he has received more than 2,5 M€ of funding from competitive calls and from the industry for research purposes.
The previous work has been recognized through: BSc Thesis Extraordinary Award 1992; UPM Extraordinary PhD Award 2001/2002; 2004 Award for Research or Technological Development for Professors of the UPM; 2002 “SIDAR-Universal Access” Award; finalist of the 2009 “best paper award” of the IEEE Engineering in Medicine and Biology Conference; 2010 and 2018 best research paper award of the Spanish Excellence network on speech technology; finalist of the “2012 Best Demo award” of the Spanish Excellence network on speech technology; “2015 Entrepreneur award” of the IEEE Spain with the startup IngeVox, 2008. Moreover, he has been appointed Fulbright Scholar, senior member of the IEEE, ELLIS member, and honorary professor at National University of Colombia.
November 12th
Andreas Stolcke, Uniphore, USA.
Title of the talk: Large Language Models for Speech Processing
Abstract:
The talk will summarize various lines of work that aim to leverage the knowledge encoded in LLMs for speech recognition and related tasks. One approach is to for LLMs to postprocess ASR outputs, either to rerank or edit (correct) hypotheses. We show that this is possible even without fine-tuning LLMs for this task, via instruction prompting and in-context learning.
Another line of work is to augment LLMs originally trained on text with acoustic information to make them pay attention to cues that are unique to speech and conversation, also leveraging pretrained acoustic embedding models. This yields multimodal models that process speech as more than transduced text, while still leveraging the long-span language “understanding” capabilities LLMs are known for. In one case, we show that LLMs can use acoustic information to model utterance sentiment to improve word prediction. In another, we are able to use LLMs to predict where in a conversations one should take the turn, back-channel, or continue listening.
Finally, I will show that while LLMs are powerful, they are still limited and far from having common sense or general intelligence, so enthusiasm about them should be tempered.
Bio: Andreas Stolcke is a distinguished scientist at Uniphore. He obtained his PhD from UC Berkeley and then worked as a researcher/scientist at SRI International, Microsoft, and Amazon. His research interests include computational linguistics, language modeling, speech recognition, speaker recognition and diarization (keeping track of multiple speakers), and paralinguistics (e.g., sentiment and emotion recognition), with over 300 papers and patents in these areas. His open-source SRI Language Modeling Toolkit is widely used in academia. For over 30 years, Andreas has a strong track record inventing novel algorithms for speech and language processing. He is a Fellow of the IEEE, the International Speech Communication Association, and the Asia-Pacific Artificial Intelligence Association.