Third Space: Creating Social Soundscapes in an Asocial Time

Published in

Voice Tech Global

9 min readJun 30, 2020

“The third place is a concept in sociology and urban planning that recognizes the role these semi-public, semi-private places play in fostering social association, community identity and civic engagement.” — Setha Low

Overview

Third Space was created by Voice Tech Global’s Civic Lab, to address the growing number of people in socially isolated settings by recreating soundscapes of places where people gather to work and connect- such as a coffee shop, library, or an office. You can try it out now on any device with Google Assistant by saying:

“Ok Google, talk to Third Space.”

The Beginning

The Team

Civic Lab is a cross-disciplinary group of designers, engineers, and data scientists aimed at creating voice apps for public institutions and non-profits. This project was an offshoot of a collaboration with the Toronto Public Library.

Initial Research + Ideation

As part of our initial collaboration with the Library, we conducted over 16 in-depth interviews with patrons, conversations with TPL, as well as an ideation workshop with over 30 interested library patrons to help discover opportunities a voice app could bring to library visitors in general.

For our ideation, we recruited not only the brainpower of our team, but also 30+ interested library patrons.

We filtered over 200 ideas down to 18 final concepts which were then prioritized based on impact and effort, in order to narrow down which project we wanted to tackle first:

200+ ideas narrowed down to 18 and then put on a Prioritization Matrix.

…And then the Pandemic happened.

During our research, some of our interviewees had mentioned the desire to go to the library to find the right environment to focus or work. As such, one idea we had was of simulating the sounds of being in a library

When Covid-19 hit, it quickly became obvious that things couldn’t continue as they had before both in the world at large and amongst our team. The social isolation both in our work and our social lives, was a common conversational thread. As such, that one idea we had suddenly gained a much higher potential impact.

A few of us tried listening to similar soundscapes ourselves. It was a stunning experience, after weeks of lockdown, to listen to the sounds of a library, or of festivals, or of children playing in a park. We had forgotten how important being in a social environment could be.

With these impactful experiences combined with the relative ease of development (or so we thought), it struck us a quick win for our first app for library users.

Setting our Goals

There were a few key things we knew we wanted to accomplish with this app:

Help those who were having trouble working and focusing whilst in isolation.
Convey a sense of social presence.
Make it simple/straightforward.

Exploring the Existing Landscape

Competitive Analysis

Before we went any further, it was incredibly important for us to spend several days investigating what was already out there to see if what we were pitching was viable.

On the Alexa Skills and Google Action directories, we found a tremendous amount of audio generators- however almost all of them were targeting things like relaxation or meditation.

A crowded audio generator market amongst Alexa Skills and Google Actions.

On the web and mobile apps, we found several web-based ambient-sound generators and mobile apps, as well as Youtube channels that played relaxing soundscapes. In fact- we were pleased to see the overwhelming response from commentators that seemed to validate the value of these kinds of soundscapes. In fact, we were among them.

So where did we fit?

On first glance, it seemed that there were a lot of similar existing products particularly on the web and app space. Also, the Alexa Skills and Google Action directories were flooded with ambient sounds. However, we found that there were no other actions/skills focused on social presence available on voice platforms.

Getting Started

Pivot: Going beyond the library

While we based our initial ideas around taking people to different library soundscapes around the world, we realized two things:

Sourcing long-form quality audio would be difficult particularly from a multitude of libraries during a pandemic.
One of our design goals was to broaden people’s exposure to a communal environments. This was not exclusive to libraries.

Even in our interviews with patrons, we learned that “a good place to work” meant different things to different people and that a library was only one option. As such we decided to also include other social places people go to work and focus. We settled on a coffee shop, a library, and an office for our MVP.

Conversation Mapping

Progression from our first script to our final Conversation Map.

Throughout the process, we had been mapping out the flow of conversation between the user and the action. We had originally started with sample dialogue prior to our pivot. As the idea came to focus, our team went through several drafts before arriving at a final Conversation Map. We tried to accommodate as many different responses as possible, as is the brute-force nature of designing voice applications today. For example, accommodating for 100+ ways one might want to ask to go to a coffee shop.

System Architecture: Moving beyond Dialogflow

We knew early on, based on our requirements that relying solely on Google’s built-in action builder, Dialogflow, would not be flexible enough for what we wanted to design. Luckily, we had custom middleware that was created for Voice Tech Global’s own skill. (Coming soon!) Invocations and responses would be go through Dialogflow and any custom fulfillments would be sent off to our own fulfillment handler on AWS Lambda in order to return the acoustically-rich media and dynamic responses that we wanted.

Our Soundscapes

Design Objectives:

We wanted to provide users with realistic social environments that would let them work focus or relax. We did a vast search for royalty-free and completely free ambient sounds until we arrived at the long-form recordings that fit our needs.

Technical Implementation:

We had originally referenced Nick Felker’s excellent article on overlaying SSML audio to craft soundscapes, as a starting point to investigate our own implementation. However, the 240 second limits Google imposes on SSML audio, were messy to deal with.

In the end we opted to use media responses, since we had no need to overlay multiple sounds for our soundscapes, it was better suited for long-form playback, as well as had a visible player for those on devices with screens. We did, however, reuse the idea of overlaying SSML audio as you’ll see later in our dynamic responses.

How Testing & Technical Constraints Informed our Final Design

After creating alpha builds based on our conversation map and testing internally, our designs continued to evolve — especially as we learned how to work within the technical constraints of Google Actions. As always, we used our initial goals (as seen above) as our guide to making design changes.

Adjustment #1: Using Pre-recorded Dialogue

Design Objectives:

We knew that we wanted to convey a sense of social presence, and we felt that the synthesized voice in our alpha ran counter to that. The Calm action was brought up as an example of action that used pre-recorded dialogue as a way to humanize the experience.

Technical Implementation:

I was asked to record a large chunk of dialogue from our Conversation Map. (Since I had previous audio experience.) This was an extensive undertaking with multiple takes for all the lines to get the feel right. However, once the early recordings were shared back with the group, we felt confident in going all in with pre-recorded dialogue.

Old and busted.

New hotness.

Adjustment #2: Joining Others in a Soundscape

Design Objectives:

As the pandemic wore on, I, like many others really started tuning into things like live Instagram feeds and the Reddit Public Access Network.

I loved the sense of presence of being able to just drop into the middle of other peoples’ live in real-time, and we wanted to see if our action could also capture this by at least letting users know how many others were “joining” them in these soundscapes.

Technical Implementation:

We started by digging through Google’s documentation to find out if this was even possible. One option was that we could ask the user permission to share personal details, but in the end, we found a solution that put up the least roadblocks that simply allowed us to count the number of times people would ‘visit’ a soundscape, without asking for any personal information.

Another problem was that the number of users was dynamic, and my pre-recorded voice was not. After some initial tests of me recording recited numbers (which was definitely not going to work), the idea of a “conversational” hand off between my voice and the synthetic voice was brought up and implemented, which can be heard here:

Adjustment #3: Dynamic Responses + Earcons

Design Objectives:

Examples of some of our variable responses for introducing people to each soundscape.

Even though we wanted to limit our MVP to a small amount of environments, that didn’t mean we didn’t want to keep things fresh for any returning users. The team went about creating multiple intros to each environment.

The clever idea of inserting earcons such as a door chime upon starting playback of a coffee shop environment was brought up. We thought that would be a quick win and put it in the spec.

Technical Implementation:

Here’s where we used a bit of the layering that Nick Felker had introduced us too. The short form and layer-able nature of the SSML allowed us to overlay the “intro” earcon with the pre-recorded welcome phrase in a natural way prior to playing back the soundscape. Our formatting for these SSML responses used the <par> tag to overlay the earcon (a door chime in this case) with a welcome narration as well as the begin attribute, to allow for a slight offset in the playback of the two sounds:

<speak>
  <par>
   <media fadeOutDur="1s">
     <audio src="https://example.com/SSML/door-chime-sound.mp3"/>
   </media>
   <media begin="2s">
     <audio src="https://example.com/SSML/coffeeshop-intro-narration.m4a"/>
   </media>
  </par>
</speak>

Here it is in audio form:

Conclusion

After weeks spent on refining down our design and 38 commits to the codebase later, we launched our MVP of Third Space, now available on any device that supports Google Assistant. Learning how to design within Google Action technical constraints, while staying true to our core goals of helping those in isolation was a challenging but invaluable experience for the team.

Check out voicetechglobal.com/third-space where you can leave feedback via the chat button so we can continue to improve the action, donate to Covid-19 Relief, or buy as a coffee!

And of course, make sure to try it out yourself and join us in Third Space by simply asking your Google Home (with or without display), Android Phone, or iOS Device with the Google Assistant App:

“Ok Google, talk to Third Space.”

Third Space on Google Assistant devices with displays

Acknowledgements

A big thank you to everyone who contributed to this project: Tim Bettridge, Vatsal Shah, Claire Son, Vivian Fu, Millani Jayasingkam, Samaher Ramzan, Jessie Sun, Patrick O’Neill, Nikhil Kardale, Gayathiri Murugamoorthy, Sharon Johnson, Ali Angco, RJ Mojica, Aimee Reynolds, Polina Cherkashnya, and Guy Tonye.

Special thanks to Aimee Reynolds, our copy editor: https://www.linkedin.com/in/aimee-reynolds-a259a619b/

Be sure to check us out and leave feedback at: www.voicetechglobal.com/third-space.

Third Space: Creating Social Soundscapes in an Asocial Time

Overview

The Beginning

The Team

Initial Research + Ideation

…And then the Pandemic happened.

Setting our Goals

Exploring the Existing Landscape

Competitive Analysis

So where did we fit?

Getting Started

Pivot: Going beyond the library

Conversation Mapping

System Architecture: Moving beyond Dialogflow

Our Soundscapes

How Testing & Technical Constraints Informed our Final Design

Adjustment #1: Using Pre-recorded Dialogue

Adjustment #2: Joining Others in a Soundscape

Adjustment #3: Dynamic Responses + Earcons

Conclusion

Acknowledgements

Written by George Wu