A fundamental behavioral and cognitive capability of humanoid robots is speech, as spoken language is the primary means of communication between humans. However, communication between people, and between humans and robots, is not only based on speech, but rather is a rich multimodal process combining spoken language with a variety of nonverbal behaviors such as eye gaze, gestures, tactile interaction, and emotional cues. This chapter gives an overview of the state of the art on language and speech capabilities in robots (i.e., “speech interface”), using multimodal approaches. The chapter considers the different levels of analysis of language studies. The computational solutions for the phonetic, lexical, and syntactic levels are general to linguistic analysis and do not require specific consideration from a robotics point of view. Other aspects of language analysis, as semantics and pragmatics, however, have specific peculiarity in robotics given their relationship to the difficult problem of “symbol grounding.” In robot language research, two main approaches have been used for the design of speech interfaces: one is based on standard, predefined natural language processing (NLP) techniques, and the second approach is based on learning methods. The chapter introduces the main NLP methods used in robot language research and subsequently looks at the speech interfaces based on such methods, also considering their use in multimodal interfaces. After this, we will look at language learning approaches which distinguish between developmental learning systems in which the robot goes through a series of developmental training phases, taking inspiration from human language learning, and machine learning approaches in which a set of learning techniques is used to engineer communication capabilities via training of multimodal speech interfaces. Finally, a critical assessment of the current state of the art and the identification of future lines of work is given.
ASJC Scopus subject areas
- コンピュータ サイエンス（全般）