SpeechBubbles: Enhancing Captioning Experiences for Deaf and Hard-of-Hearing People in Group Conversations

Intro clip:

In this work, we designed and implemented SpeechBubbles, a real-time speech recognition interface on an augmented reality head-mounted display (HMD). It aims to provide an desirable caption visualization for deaf and hard of hearing (DHH) users, so that they may become more seamlessly engaged in speech-based group communication.


To better understand the problems DHH people faced during daily group conversation with hearing people, we conducted the semi-structure interview and participatory design session with 8 DHH people for the possible solutions.

We collected participants’ feedback and categorized them into four key design dimension:

We came up with several design for each dimension, which will be descibed repectively in the following part. To iterate our design efficiently, we used the video prototype displayed on the LCD computer monitor to demonstrate different design.

As for speaker accosication and amount of content, we proposed one-line, two-line and three-line scrolling-text bubble-like visualization compared with the traditional caption.

The results showed that bubble-like visualization is preferred over traditioanl captioning style, and multi-line scrolling-text style is preferred over the single-line one for the bubble design.

As for ordering of utterances, we proposed numbered and rising design for the bubble-like visualization compared with the multi-line scrolling-text bubble aforementioned.

The results showed that rising effect is preferred over other design on bubble-style captioning in terms of user’s daily conversation.

As for out-of-view speakers and some related information, our hint design is focused on two different aspects:

For the design of the out-of-view speakers’ utterances, we proposed 1) ellipses for indicating when someone is talking, 2) partial content of what the speaker said, and 3) complete content of what the speaker said.

For the design of the out-of-view speakers’ location, we proposed 1) bidirectional for determining the direction of the out-of-view speakers by only indicating whether the speaker is located to the left or right of the user, 2) numeric angle for presenting the angle between the speakers and the user from 0 to 180 degree(s), and 3) egocentric for converting the plane parallel to the user into the plane of the real world where the user is standing and where the bottom of the view denotes positioning behind the user.

The results showed that hint bubbles with speech content and placed bidirectionally are the most preferred design.

We summarized the overall preferred design and implemented such design as SpeechBubbles on the Microsoft Hololens (AR interface).

We invited 6 DHH people to join the group converation so as to compare SpeechBubbles with the traditional caption on the Hololens. The results showed that our prototype is preferred over the traditional caption.

Please refer to our paper, video and talk for detailed information.

Publication / Media