Concurrent talking in immersive virtual reality: On the dominance of visual speech cues

Gonzalez-Franco, M; Maselli, A; Florencio, D; Smolyanskiy, N; Zhang, Z

doi:10.1038/s41598-017-04201-x

Humans are good at selectively listening to specific target conversations, even in the presence of multiple concurrent speakers. In our research, we study how auditory-visual cues modulate this selective listening. We do so by using immersive Virtual Reality technologies with spatialized audio. Exposing 32 participants to an Information Masking Task with concurrent speakers, we find significantly more errors in the decision-making processes triggered by asynchronous audiovisual speech cues. More precisely, the results show that lips on the Target speaker matched to a secondary (Mask) speaker’s audio severely increase the participants’ comprehension error rates. In a control experiment (n = 20), we further explore the influences of the visual modality over auditory selective attention. The results show a dominance of visual-speech cues, which effectively turn the Mask into the Target and vice-versa. These results reveal a disruption of selective attention that is triggered by bottom-up multisensory integration. The findings are framed in the sensory perception and cognitive neuroscience theories. The VR setup is validated by replicating previous results in this literature in a supplementary experiment.