An Outlook for AI Innovation in Multimodal Communication Research

In Duffy Vincent G. (ed.), Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management (HCII 2024). pp. 182–234 (2024)
  Copy   BIBTEX

Abstract

In the rapidly evolving landscape of multimodal communication research, this paper explores the transformative role of machine learning (ML), particularly using multimodal large language models, in tracking, augmenting, annotating, and analyzing multimodal data. Building upon the foundations laid in our previous work, we explore the capabilities that have emerged over the past years. The integration of ML allows researchers to gain richer insights from multimodal data, enabling a deeper understanding of human (and non-human) communication across modalities. In particular, augmentation methods have become indispensable because they facilitate the synthesis of multimodal data and further increase the diversity and richness of training datasets. In addition, ML-based tools have accelerated annotation processes, reducing human effort while improving accuracy. Continued advances in ML and the proliferation of more powerful models suggest even more sophisticated analyses of multimodal communication, e.g., through models like ChatGPT, which can now “understand” images. This makes it all the more important to assess what these models can achieve now or in the near future, and what will remain unattainable beyond that. We also acknowledge the ethical and practical challenges associated with these advancements, emphasizing the importance of responsible AI and data privacy. We must be careful to ensure that benefits are shared equitably and that technology respects individual rights. In this paper, we highlight advances in ML-based multimodal research and discuss what the near future holds. Our goal is to provide insights into this research stream for both the multimodal research community, especially in linguistics, and the broader ML community. In this way, we hope to foster collaboration in an area that is likely to shape the future of technologically mediated human communication.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,069

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Models for anodic and cathodic multimodalities.Juliana Bueno-Soler - 2012 - Logic Journal of the IGPL 20 (2):458-476.

Analytics

Added to PP
2024-06-04

Downloads
0

6 months
0

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Reetu Bhattacharjee
University of Münster
Jens Lemanski
University of Münster

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references