{"id":778,"date":"2026-04-24T10:51:23","date_gmt":"2026-04-24T10:51:23","guid":{"rendered":"https:\/\/iberspeech.tech\/?page_id=778"},"modified":"2026-05-11T10:48:45","modified_gmt":"2026-05-11T10:48:45","slug":"keynotes","status":"publish","type":"page","link":"https:\/\/iberspeech.tech\/2026\/keynotes\/","title":{"rendered":"KEYNOTES"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; custom_padding_last_edited=&#8221;off|desktop&#8221; admin_label=&#8221;Hero&#8221; _builder_version=&#8221;4.24.2&#8243; background_enable_color=&#8221;off&#8221; custom_margin=&#8221;|||&#8221; custom_padding=&#8221;50px||0px|||&#8221; custom_padding_tablet=&#8221;130px||130px|&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; width=&#8221;100%&#8221; max_width=&#8221;1280px&#8221; custom_padding=&#8221;|15px|3px|15px|false|false&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.24.2&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.24.2&#8243; _module_preset=&#8221;default&#8221; header_font=&#8221;|||on|||||&#8221; custom_margin=&#8221;||1px|||&#8221; global_colors_info=&#8221;{}&#8221;]<\/p>\n<h1><span style=\"color: #0c71c3;\">KEYNOTE SPEAKERS<\/span><\/h1>\n<p>[\/et_pb_text][et_pb_divider color=&#8221;#0c71c3&#8243; divider_weight=&#8221;3px&#8221; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; width=&#8221;50%&#8221; module_alignment=&#8221;left&#8221; global_colors_info=&#8221;{}&#8221;][\/et_pb_divider][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_team_member name=&#8221;Professor Jo\u00e3o Magalh\u00e3es&#8221; position=&#8221;Full Professor at the Computer Science Dep. at Universidade NOVA de Lisboa and national co-Director of the CMU Portugal partnership.&#8221; image_url=&#8221;https:\/\/iberspeech.tech\/2026\/wp-content\/uploads\/2026\/04\/Joao-Magalhaes-profile.jpeg&#8221; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; header_font_size=&#8221;28px&#8221; position_font_size=&#8221;18px&#8221; custom_margin=&#8221;0px||||false|false&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p style=\"text-align: justify;\"><span data-teams=\"true\"><\/span><\/p>\n<p style=\"text-align: justify;\"><span data-teams=\"true\">Jo\u00e3o Magalh\u00e3es holds a Ph.D. degree (2008) from Imperial College London, UK. His research aims to move vision and language AI closer to the way humans understand it and communicate. He has made scientific contributions to the fields of multimedia search and summarization, multimodal conversational AI, data mining and multimodal information representation. He is currently coordinating the creation of the sovereign LLM AMALIA, and, in the past, has coordinated and participated in several research projects (national, EU-FP7 and H2020) where he pursues robust and generalizable methods in different domains. He is regularly involved in review panels, organization of international conferences and program committees. His work and the work of his group has been awarded, or nominated for, several honours and distinctions, most notably the 1st prize in the Amazon Alexa Taskbot Challenge 2022. He was the General Chair of ECIR 2020 and ACM Multimedia 2022, Honorary Chair for ACM Multimedia Asia 2021 and will be the PC chair of ACM Multimedia 2026.<\/span><\/p>\n<p>[\/et_pb_team_member][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_toggle title=&#8221;Title of the talk: %22Multimodal Conversational Assistance of Complex Manual Tasks%22&#8243; open=&#8221;on&#8221; toggle_icon=&#8221;&#x43;||divi||400&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; title_level=&#8221;h3&#8243; title_font=&#8221;|||||on|||&#8221; body_font_size=&#8221;15px&#8221; custom_margin=&#8221;||30px|||&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<h4 style=\"text-align: justify;\"><span data-teams=\"true\">Abstract<\/span><\/h4>\n<p style=\"text-align: justify;\"><span data-teams=\"true\">Conversational agents have become an integral part of our daily routines, aiding humans in various tasks. Helping users in real-world manual tasks is a complex and challenging paradigm, where it is necessary to leverage multiple information sources, provide several multimodal stimuli, and be able to correctly ground the conversation in a helpful and robust manner. In this talk I will describe TWIZ, a conversational AI assistant that is helpful, multimodal, knowledgeable, and engaging, and designed to guide users towards the successful completion of complex manual tasks. To achieve this, we focused our efforts on three main research questions: (1) Humanly-Shaped Conversations, by providing information in a knowledgeable way; (2) Multimodal Stimulus, making use of various modalities including voice, images, and videos; and (3) Zero-shot Conversational Flows, to improve the robustness of the interaction to unseen scenarios. TWIZ is an assistant capable of supporting a wide range of unseen tasks &#8212; it leverages Generative AI methods to deliver several innovative features such as creative cooking, video navigation through voice, and the robust PlanLLM, a Large Language Model trained for dialoguing about complex manual tasks.<\/span><\/p>\n<p>[\/et_pb_toggle][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;%22default%22&#8243; custom_padding=&#8221;%220px||2px|||%22&#8243; global_colors_info=&#8221;%22{}%22&#8243;][et_pb_row _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221;][et_pb_column _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; type=&#8221;4_4&#8243;][et_pb_team_member name=&#8221;Prof. Dr. Elisabeth Andr\u00e9&#8221; position=&#8221;Full Professor of Computer Science, Chair of Human-Centered Artificial Intelligence, Faculty of Applied Informatics, Augsburg University&#8221; image_url=&#8221;https:\/\/iberspeech.tech\/2026\/wp-content\/uploads\/2026\/05\/elisabeth-small.png&#8221; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; header_font_size=&#8221;28px&#8221; position_font_size=&#8221;18px&#8221; custom_margin=&#8221;0px||||false|false&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243; title_text=&#8221;elisabeth-small&#8221;]<\/p>\n<p style=\"text-align: justify;\"><span data-teams=\"true\"><\/span><\/p>\n<p style=\"text-align: justify;\"><span data-teams=\"true\">Elisabeth Andr\u00e9 is a Full Professor of Computer Science and the Founding Chair of Human-Centered Artificial Intelligence at Augsburg University, Germany. A global leader in multimodal human-machine interaction and social signal processing, her work has been foundational in enabling machines to perceive and respond to human speech, gestures, and emotions in a natural, socially intelligent manner. Her research is dedicated to the development of &#8220;believable&#8221; virtual agents and social robots capable of sophisticated, human-like dialogue. For her pioneering contributions to artificial emotional intelligence, she was awarded the Gottfried Wilhelm Leibniz Prize by the German Research Foundation (DFG). As Germany\u2019s most prestigious research honor, the prize recognizes her trailblazing role in bridging the gap between Artificial Intelligence and Human-Computer Interaction to create technology that is more intuitive and empathetic. Beyond her technical research, Professor Andr\u00e9 is recognized as one of the most influential voices in the field. In 2024, Manager Magazin named her one of the 15 most important women in AI in Germany, and in 2019, she was honored by the National Society for Informatics (GI) as one of the ten most influential figures in the history of German AI. Her long-standing impact on the community was further recognized with the ICMI Sustained Accomplishment Award and the 2025 AI Visionary TIGER Award. Professor Andr\u00e9 is a member of the German Academy of Sciences (Leopoldina), the CHI Academy, and a EurAI Fellow. She currently co-leads the &#8220;Work\/Qualification and Human-Machine Interaction&#8221; group for Germany&#8217;s National Platform for Artificial Intelligence.<\/span><\/p>\n<p>[\/et_pb_team_member][et_pb_toggle title=&#8221;Title of the talk: %22From Speech to Multimodal Interaction: Guardrails for Socially-Interactive AI%22&#8243; open=&#8221;on&#8221; toggle_icon=&#8221;&#x43;||divi||400&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; title_level=&#8221;h3&#8243; title_font=&#8221;|||||on|||&#8221; body_font_size=&#8221;15px&#8221; custom_margin=&#8221;||30px|||&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<h4 style=\"text-align: justify;\"><span data-teams=\"true\">Abstract<\/span><\/h4>\n<p style=\"text-align: justify;\"><span data-teams=\"true\">When speech moves from transcription to multimodal interaction, its requirements change fundamentally. In such settings, systems must operate under ambiguity and uncertainty while accounting for social context and application-specific constraints. This talk focuses on the design of guardrails that go beyond generic content filtering. I will present approaches for risk identification, evaluation, and mitigation, including mechanisms for uncertainty handling, policy compliance, and improving the reliability of LLM-based systems in multimodal interaction. These challenges and solutions are illustrated through three complementary perspectives from interdisciplinary projects: social coaching through role-play with robots (CAIDA) and virtual agents for children (CONFIDENCE), where particular care must be taken to avoid stigmatization of marginalized groups and psychologically harmful interaction scenarios; language-enabled robotics in healthcare (REGINA), where systems must ensure confidentiality, correctness, and certifiability while remaining accessible through natural language and low-code\/no-code interfaces; and CAR-bench, a benchmark for evaluating consistency, uncertainty awareness, and capability awareness in multi-turn, tool-using LLM agents for in-car assistant scenarios.<\/span><\/p>\n<p>[\/et_pb_toggle][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>KEYNOTE SPEAKERS Jo\u00e3o Magalh\u00e3es holds a Ph.D. degree (2008) from Imperial College London, UK. His research aims to move vision and language AI closer to the way humans understand it and communicate. He has made scientific contributions to the fields of multimedia search and summarization, multimodal conversational AI, data mining and multimodal information representation. He [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-778","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>KEYNOTES - Iberspeech<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/iberspeech.tech\/2026\/keynotes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"KEYNOTES - Iberspeech\" \/>\n<meta property=\"og:description\" content=\"KEYNOTE SPEAKERS Jo\u00e3o Magalh\u00e3es holds a Ph.D. degree (2008) from Imperial College London, UK. His research aims to move vision and language AI closer to the way humans understand it and communicate. He has made scientific contributions to the fields of multimedia search and summarization, multimodal conversational AI, data mining and multimodal information representation. He [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/iberspeech.tech\/2026\/keynotes\/\" \/>\n<meta property=\"og:site_name\" content=\"Iberspeech\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-11T10:48:45+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/iberspeech.tech\/2026\/keynotes\/\",\"url\":\"https:\/\/iberspeech.tech\/2026\/keynotes\/\",\"name\":\"KEYNOTES - Iberspeech\",\"isPartOf\":{\"@id\":\"https:\/\/iberspeech.tech\/2026\/#website\"},\"datePublished\":\"2026-04-24T10:51:23+00:00\",\"dateModified\":\"2026-05-11T10:48:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/iberspeech.tech\/2026\/keynotes\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/iberspeech.tech\/2026\/keynotes\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/iberspeech.tech\/2026\/keynotes\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/iberspeech.tech\/2026\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"KEYNOTES\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/iberspeech.tech\/2026\/#website\",\"url\":\"https:\/\/iberspeech.tech\/2026\/\",\"name\":\"Iberspeech\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/iberspeech.tech\/2026\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"KEYNOTES - Iberspeech","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/iberspeech.tech\/2026\/keynotes\/","og_locale":"en_US","og_type":"article","og_title":"KEYNOTES - Iberspeech","og_description":"KEYNOTE SPEAKERS Jo\u00e3o Magalh\u00e3es holds a Ph.D. degree (2008) from Imperial College London, UK. His research aims to move vision and language AI closer to the way humans understand it and communicate. He has made scientific contributions to the fields of multimedia search and summarization, multimodal conversational AI, data mining and multimodal information representation. He [&hellip;]","og_url":"https:\/\/iberspeech.tech\/2026\/keynotes\/","og_site_name":"Iberspeech","article_modified_time":"2026-05-11T10:48:45+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/iberspeech.tech\/2026\/keynotes\/","url":"https:\/\/iberspeech.tech\/2026\/keynotes\/","name":"KEYNOTES - Iberspeech","isPartOf":{"@id":"https:\/\/iberspeech.tech\/2026\/#website"},"datePublished":"2026-04-24T10:51:23+00:00","dateModified":"2026-05-11T10:48:45+00:00","breadcrumb":{"@id":"https:\/\/iberspeech.tech\/2026\/keynotes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/iberspeech.tech\/2026\/keynotes\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/iberspeech.tech\/2026\/keynotes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/iberspeech.tech\/2026\/"},{"@type":"ListItem","position":2,"name":"KEYNOTES"}]},{"@type":"WebSite","@id":"https:\/\/iberspeech.tech\/2026\/#website","url":"https:\/\/iberspeech.tech\/2026\/","name":"Iberspeech","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/iberspeech.tech\/2026\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/pages\/778","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/comments?post=778"}],"version-history":[{"count":13,"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/pages\/778\/revisions"}],"predecessor-version":[{"id":1988,"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/pages\/778\/revisions\/1988"}],"wp:attachment":[{"href":"https:\/\/iberspeech.tech\/2026\/wp-json\/wp\/v2\/media?parent=778"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}