Speech-Based Interaction

A special issue of Multimodal Technologies and Interaction (ISSN 2414-4088).

Deadline for manuscript submissions: closed (30 January 2022) | Viewed by 9349

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK
Interests: spoken language processing; vocal interactivity in-and-between humans, animals and robots
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent times have seen a tremendous growth in the availability of speech-based interaction with a range of ‘intelligent’ systems. The integration of spoken language technology into smartphones and smart speakers has led to an explosion of voice-based applications. Primarily thanks to advances in deep learning, the performance of the component technologies such as automatic speech recognition and speech synthesis has improved immensely, and this has facilitated the deployment of hugely popular commercial systems such as Apple’s Siri and Amazon’s Alexa. Interacting with such voice-enabled personal agents is, however, relatively restricted in comparison with human–human spoken conversation, and much is still to be learnt about how to model and/or implement effective speech-based interaction in a human–machine context. Contemporary interactions tend to be short one-off transactions based on simple question answering or spoken commands—some way short of the rich social and informational exchanges that characterise conversational interaction between human interlocutors. This Special Issue aims to draw together the latest research results in this area, with a particular emphasis on the open challenges facing speech-based interaction between humans and machines.

Potential authors are encouraged to submit original research papers that capture state-of-the-art and emerging approaches in speech-based interaction, position papers that provide theoretical insights and identify critical challenges, and case studies that illustrate the outstanding research issues. Suitable topics include, but are not limited to:

  • Conversational user interfaces;
  • Multimodal interaction;
  • Interaction management (including turn-taking and inter-agent coupling);
  • Spoken language dialogue systems;
  • Incremental spoken language processing;
  • Multi-party spoken interaction;
  • Speech-based human–robot/agent interaction;
  • Human factors issues (such as anthropomorphism, mismatched agents and usability/habitability);
  • Ethical issues (including misrepresentation and bias);
  • Human versus machine spoken language capabilities (such as grounding, empathy and theory of mind);
  • Sustained versus one-off spoken interactions (including learning and memory);
  • Speech-based interaction in real-world environments.

Prof. Dr. Roger K. Moore
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 305 KiB  
Article
An Enactivist Account of Mind Reading in Natural Language Understanding
by Peter Wallis
Multimodal Technol. Interact. 2022, 6(5), 32; https://0-doi-org.brum.beds.ac.uk/10.3390/mti6050032 - 29 Apr 2022
Viewed by 1811
Abstract
In this paper we apply our understanding of the radical enactivist agenda to the classic AI-hard problem of Natural Language Understanding. When Turing devised his famous test the assumption was that a computer could use language and the challenge would be to mimic [...] Read more.
In this paper we apply our understanding of the radical enactivist agenda to the classic AI-hard problem of Natural Language Understanding. When Turing devised his famous test the assumption was that a computer could use language and the challenge would be to mimic human intelligence. It turned out playing chess and formal logic were easy compared to understanding what people say. The techniques of good old-fashioned AI (GOFAI) assume symbolic representation is the core of reasoning and by that paradigm human communication consists of transferring representations from one mind to another. However, one finds that representations appear in another’s mind without appearing in the intermediary language. People communicate by mind reading it seems. Systems with speech interfaces such as Alexa and Siri are of course common, but they are limited. Rather than adding mind reading skills, we introduced a “cheat” that enabled our systems to fake it. The cheat is simple and only slightly interesting to computer scientists and not at all interesting to philosophers. However, reading about the enactivist idea that we “directly perceive” the intentions of others, our cheat took on a new light and in this paper we look again at how natural language understanding might actually work between humans. Full article
(This article belongs to the Special Issue Speech-Based Interaction)
Show Figures

Figure 1

18 pages, 1633 KiB  
Article
Exploring Data-Driven Components of Socially Intelligent AI through Cooperative Game Paradigms
by Casey Bennett, Benjamin Weiss, Jaeyoung Suh, Eunseo Yoon, Jihong Jeong and Yejin Chae
Multimodal Technol. Interact. 2022, 6(2), 16; https://0-doi-org.brum.beds.ac.uk/10.3390/mti6020016 - 17 Feb 2022
Cited by 7 | Viewed by 2769
Abstract
The development of new approaches for creating more “life-like” artificial intelligence (AI) capable of natural social interaction is of interest to a number of scientific fields, from virtual reality to human–robot interaction to natural language speech systems. Yet how such “Social AI” agents [...] Read more.
The development of new approaches for creating more “life-like” artificial intelligence (AI) capable of natural social interaction is of interest to a number of scientific fields, from virtual reality to human–robot interaction to natural language speech systems. Yet how such “Social AI” agents might be manifested remains an open question. Previous research has shown that both behavioral factors related to the artificial agent itself as well as contextual factors beyond the agent (i.e., interaction context) play a critical role in how people perceive interactions with interactive technology. As such, there is a need for customizable agents and customizable environments that allow us to explore both sides in a simultaneous manner. To that end, we describe here the development of a cooperative game environment and Social AI using a data-driven approach, which allows us to simultaneously manipulate different components of the social interaction (both behavioral and contextual). We conducted multiple human–human and human–AI interaction experiments to better understand the components necessary for creation of a Social AI virtual avatar capable of autonomously speaking and interacting with humans in multiple languages during cooperative gameplay (in this case, a social survival video game) in context-relevant ways. Full article
(This article belongs to the Special Issue Speech-Based Interaction)
Show Figures

Figure 1

9 pages, 224 KiB  
Communication
A Review of Automated Speech-Based Interaction for Cognitive Screening
by Costas Boletsis
Multimodal Technol. Interact. 2020, 4(4), 93; https://0-doi-org.brum.beds.ac.uk/10.3390/mti4040093 - 17 Dec 2020
Cited by 2 | Viewed by 3534
Abstract
Language, speech and conversational behaviours reflect cognitive changes that may precede physiological changes and offer a much more cost-effective option for detecting preclinical cognitive decline. Artificial intelligence and machine learning have been established as a means to facilitate automated speech-based cognitive screening through [...] Read more.
Language, speech and conversational behaviours reflect cognitive changes that may precede physiological changes and offer a much more cost-effective option for detecting preclinical cognitive decline. Artificial intelligence and machine learning have been established as a means to facilitate automated speech-based cognitive screening through automated recording and analysis of linguistic, speech and conversational behaviours. In this work, a scoping literature review was performed to document and analyse current automated speech-based implementations for cognitive screening from the perspective of human–computer interaction. At this stage, the goal was to identify and analyse the characteristics that define the interaction between the automated speech-based screening systems and the users, potentially revealing interaction-related patterns and gaps. In total, 65 articles were identified as appropriate for inclusion, from which 15 articles satisfied the inclusion criteria. The literature review led to the documentation and further analysis of five interaction-related themes: (i) user interface, (ii) modalities, (iii) speech-based communication, (iv) screening content and (v) screener. Cognitive screening through speech-based interaction might benefit from two practices: (1) implementing more multimodal user interfaces that facilitate—amongst others—speech-based screening and (2) introducing the element of motivation in the speech-based screening process. Full article
(This article belongs to the Special Issue Speech-Based Interaction)
Back to TopTop