KEYNOTE SPEAKERS

Professor João Magalhães

Professor João Magalhães

Full Professor at the Computer Science Dep. at Universidade NOVA de Lisboa and national co-Director of the CMU Portugal partnership.

João Magalhães holds a Ph.D. degree (2008) from Imperial College London, UK. His research aims to move vision and language AI closer to the way humans understand it and communicate. He has made scientific contributions to the fields of multimedia search and summarization, multimodal conversational AI, data mining and multimodal information representation. He is currently coordinating the creation of the sovereign LLM AMALIA, and, in the past, has coordinated and participated in several research projects (national, EU-FP7 and H2020) where he pursues robust and generalizable methods in different domains. He is regularly involved in review panels, organization of international conferences and program committees. His work and the work of his group has been awarded, or nominated for, several honours and distinctions, most notably the 1st prize in the Amazon Alexa Taskbot Challenge 2022. He was the General Chair of ECIR 2020 and ACM Multimedia 2022, Honorary Chair for ACM Multimedia Asia 2021 and will be the PC chair of ACM Multimedia 2026.

Title of the talk: Multimodal Conversational Assistance of Complex Manual Tasks

Abstract

Conversational agents have become an integral part of our daily routines, aiding humans in various tasks. Helping users in real-world manual tasks is a complex and challenging paradigm, where it is necessary to leverage multiple information sources, provide several multimodal stimuli, and be able to correctly ground the conversation in a helpful and robust manner. In this talk I will describe TWIZ, a conversational AI assistant that is helpful, multimodal, knowledgeable, and engaging, and designed to guide users towards the successful completion of complex manual tasks. To achieve this, we focused our efforts on three main research questions: (1) Humanly-Shaped Conversations, by providing information in a knowledgeable way; (2) Multimodal Stimulus, making use of various modalities including voice, images, and videos; and (3) Zero-shot Conversational Flows, to improve the robustness of the interaction to unseen scenarios. TWIZ is an assistant capable of supporting a wide range of unseen tasks — it leverages Generative AI methods to deliver several innovative features such as creative cooking, video navigation through voice, and the robust PlanLLM, a Large Language Model trained for dialoguing about complex manual tasks.