SpeechBubbles: Enhancing Captioning Experiences for Deaf and Hard-of-Hearing People in Group Conversations



Intro clip:


In this work, we designed and implemented SpeechBubbles, a real-time speech recognition interface on an augmented reality head-mounted display (HMD). It aims to provide an desirable caption visualization for deaf and hard of hearing (DHH) users, so that they may become more seamlessly engaged in speech-based group communication.

A proof-of-concept demonstration of SpeechBubbles. The user (right) equipped with a Microsoft HoloLens views speech bubbles adjacent to two speakers (left and middle).

To better understand the problems DHH people faced during daily group conversation with hearing people, we conducted the semi-structure interview and participatory design session with 8 DHH people for the possible solutions.

Participants’ drawn ideal designs during the co-design process

We collected participants’ feedback and categorized them into four key design dimension:

Design exploration for each of the four dimensions: 1) speaker association 2) amount of content, 3) order of utterances, 4.a) out-of-view caption, and 4.b) out-of-view speaker’s location.

We came up with several design for each dimension, which will be descibed repectively in the following part. To iterate our design efficiently, we used the video prototype displayed on the LCD computer monitor to demonstrate different design.

Design of one-line, two-line, three-line text-bubble and caption visualzation. As for speaker accosication and amount of content, we proposed one-line, two-line and three-line scrolling-text bubble-like visualization compared with the traditional caption.

The results showed that bubble-like visualization is preferred over traditioanl captioning style, and multi-line scrolling-text style is preferred over the single-line one for the bubble design.

Design of scrolling-text, numbered, and rising bubble visualization.

As for ordering of utterances, we proposed numbered and rising design for the bubble-like visualization compared with the multi-line scrolling-text bubble aforementioned.

The results showed that rising effect is preferred over other design on bubble-style captioning in terms of user’s daily conversation.

Design of ellipsis, partial and complete utterances presentation for out-of-view speech visualization. Design of bi-directional, numbered angle and egocentric method for out-of-view speaker's location visualization.

As for out-of-view speakers and some related information, our hint design is focused on two different aspects:

For the design of the out-of-view speakers’ utterances, we proposed 1) ellipses for indicating when someone is talking, 2) partial content of what the speaker said, and 3) complete content of what the speaker said.

For the design of the out-of-view speakers’ location, we proposed 1) bidirectional for determining the direction of the out-of-view speakers by only indicating whether the speaker is located to the left or right of the user, 2) numeric angle for presenting the angle between the speakers and the user from 0 to 180 degree(s), and 3) egocentric for converting the plane parallel to the user into the plane of the real world where the user is standing and where the bottom of the view denotes positioning behind the user.

The results showed that hint bubbles with speech content and placed bidirectionally are the most preferred design.

Optimal SpeechBubbles visualization: rising bubbles coupled with complete utterances and bi-directional visualization for out-of-view information.

We summarized the overall preferred design and implemented such design as SpeechBubbles on the Microsoft Hololens (AR interface).

We invited 6 DHH people to join the group converation so as to compare SpeechBubbles with the traditional caption on the Hololens. The results showed that our prototype is preferred over the traditional caption.

The live demo AR view that augments conversations between two people using bubble-like visualization anchored near the individual face. The bubble will rise once new utternaces are visualized.

Please refer to our paper, video and talk for detailed information.


Publication / Media