Ethically Collecting Conversations With People that have Cognitive Impairments

Improving the Accessibility of Voice Assistants: Doing Things Right

Angus Addlesee
Towards Data Science

--

This is a streamlined abridgement of my paper with Pierre Albert, published at LREC’s Workshop on Legal and Ethical Issues in Human Language Technologies 2020. If you use any of this guide in your research, please do cite our paper titled “Ethically Collecting Multi-Modal Spontaneous Conversations with People that have Cognitive Impairments”:

Harvard:
Addlesee, A. and Albert, P., 2020. Ethically Collecting Multi-Modal Spontaneous Conversations with People that have Cognitive Impairments. LREC Workshop on Legal and Ethical Issues in Human Language Technologies.
BibTeX:
@inproceedings{addlesee2020ethically,
title={Ethically Collecting Multi-Modal Spontaneous Conversations with People that have Cognitive Impairments},
author={Addlesee, Angus and Albert, Pierre},
journal={LREC Workshop on Legal and Ethical Issues in Human Language Technologies},
year={2020}
}

Getting ethical approval to collect a crucial corpus took me over a year to complete. This was relatively fresh ground to tread, but I hope other researchers want to work on the accessibility of voice assistants for people with all varieties of cognitive impairments. This practical guide aims to help future researchers, like me, collect these valuable datasets quickly without compromising any ethical considerations or data security.

Introduction

Over a year ago now, I decided that I wanted to work to make voice assistants (Siri, Alexa, etc…) more accessible for people with dementia. To begin this project, I (with two of my supervisors) first detailed some of the critical challenges that need to be tackled if we are to make progress towards this goal. One huge hurdle is the lack of suitable data… so let’s just collect it?

I first decided exactly what data was needed and worked to organise the task, recording location, recording equipment, etc… I planned to record both audio and video recordings of friendly, off the cuff conversations with people that have dementia. As a mathematician/computer scientist, there was one major job that I was not experienced enough to do without assistance however - the ethical approval.

source

I spent over a year talking to experts from the NHS, charities, dementia research centres, Psychology departments, ethics boards, (the list goes on) to ensure this data collection was completed both ethically and securely.

This article includes what I learned from these experts as a practical guide and details of a device, “CUSCO” developed by Pierre Albert, to capture and store this data as securely as possible.

The Data

I have discussed why this data is important previously but in the context of this guide, I will expand on that before briefly describing exactly what data we are collecting.

Natural Speech

While we have natural everyday conversations with one and other, we don’t talk cleanly as our speech is littered with beautiful conversational phenomena:

  • Hesitations & Pauses - “I went to that great coffee shop Peppers!”
  • Filled Pauses - “I went to that great coffee shop… umm… Peppers!”
  • Repairs - “Then go to the left… no no, to the right!
  • Repetition - “It was so so so good!”
  • Backchannels - (someone on the phone) “Yep… uhu… hmm…

When we speak to voice assistants however, we remove all of these phenomena from our speech, called Computer Talk. To illustrate this, which would you say to a voice assistant (called Jacob):

  • “Jacob, my hands are wet so could you set a timer for… umm… I don’t know… a a couple minutes please.”
  • “Jacob, set a timer for two minutes”

We all know which one because…

We adapt to the capabilities of the system! We learn what functionalities are available and how best to speak in order to make them work. For older adults (with a lack of experience using voice assistants) and those with cognitive impairments, these “Smart Devices” can be extremely frustrating. This is because voice assistants do not understand natural communication, with all the above phenomena (and more), which is exactly how people have always communicated.

Natural Visual Communication

We don’t just communicate with our words either, we use visual cues throughout every in-person conversation. For example:

  • “What is that?” - Impossible to answer without a point or look.
  • “Huh” - They could look confused or surprised (“Huh, interesting!”)
  • Silence - Nodding to agree or shaking head in disagreement.

You are hopefully thinking of many more examples and realise how many wonderful jumbles of valuable information we present visually. This material is ignored entirely by most voice assistants (and fair enough, they don’t have cameras) but it could be extremely beneficial to interpret certain user groups.

People with dementia, for example, may pause for longer than expected while thinking of a word. If the voice assistant could see that this user was clearly thinking of a word, it could wait patiently. Alternatively, like I worked on with a group of MSc students, it could suggest the most likely completions.

Pepper - source

Voice assistants that utilise visual feeds are becoming more commonplace as they are adapted for blind users (like I did with another group of MSc students) and embedded into assistive robots. If these visual cues prove to benefit older adults and people with cognitive impairments in addition, then we must explore how to do this.

Getting into the Details

In order to study how people with cognitive impairments naturally interact, we need to analyse natural conversations with people that have the relevant cognitive impairment.

If we consider dementia for example, there are two regularly used datasets:

  • The Pitt Corpus :- The participants describe a picture so there is no conversation, it is simply a monologue. There is also no video.
  • The Carolina Conversations Collection - The collection is very large but limited when searching for people with dementia (Luz et al report 21). The conversations are interviews with memory-based questions so not entirely a natural conversation. Finally, the audio isn’t very high quality and the dementia examples do not have video.

We therefore want to capture high quality audio and video recordings of natural, unplanned conversations in which one participant has dementia. These conversations will not be interviews and will not ask for any personal information. Just a fun chat!

To do this, we are working with the creator of a task designed to do exactly that. Sofia de la Fuente Garcia created a variant of the map task to elicit spontaneous conversations with people that have dementia.

Using this task, a healthy participant will sit opposite a person that has dementia for a casual conversation. Both participants have a map with the same locations, but only the person with dementia can see the possible routes through the imaginary land:

Map with routes shown - de la Fuente Garcia et al., 2019

The healthy participant is the only one who knows which locations the pair need to visit. They therefore need to collaborate through conversation to go on the journey together

This conversation is what we want to capture, but how do we do that ethically and securely?

Ethical Considerations

Now we are getting to the core of the article - how to practically collect this valuable data, with people that have cognitive impairments, ethically.

In this section I cover: consent, participant comfort (ensuring the participant does not feel uneasy), participant recruitment, and optional cognitive assessments. I then cover data security (including a new device) in the final section.

Consent

People with cognitive impairments are considered vulnerable participants and a witness is therefore required to watch over the consent process. The witness should be a family member or carer and has to sign a ‘witness of consent form’ after a vulnerable participant consents to taking part in the study.

Every participant is given a ‘participant information sheet (PIS)’ at least a week prior to the data collection which contains all information about the study (what it will involve, the benefits of taking part, what data will be stored, etc…). Providing the PIS early gives every participant the opportunity to read, digest, ask questions, and understand the information. You should stress that all questions are welcome and that participation is entirely voluntary. It is key to note here that in addition to members of the research team, questions can be asked to family members, carers, or GPs.

In my opinion, providing the PIS early is one of the most crucial steps in this process. If you feel uncomfortable providing the PIS early, why? What in your procedure is making you uncomfortable and why do you not want people noticing it? This step is a final check in essence, you will know when you are happy to take questions and consider your study ‘ethical’.

Distribution of required documents

On the day of collection, both participants and the witness of consent are given their respective consent forms (as illustrated above). Participant consent forms summarise the key points in the PIS and confirm that the participant has read and understood it. The witness then signs their ‘Witness of Consent Form’ that confirms that the participant with a cognitive impairment understood the PIS, had all of their questions answered, and willingly consented to take part in the study.

Immoral researchers could attempt to trick a person with a cognitive impairment (for example, offering to make them a cup of tea after they “sign a quick form”) or elicit personal information (for example, asking about their previous medical history). To ensure this cannot happen, the witness also signs to confirm that the researcher did not attempt to elicit personal information, mislead, or trick the participant.

Participant Comfort

Participants are spending their valuable time helping with research but could feel stressed about taking part, especially those with cognitive impairments. Ensuring people have a comfortable experience is therefore of paramount importance.

Even before taking part, the PIS should contain as much information as possible to prevent unnecessary stress. For example, it can highlight the following about the task:

  • It requires no preparation.
  • It is not a medical examination.
  • We want to record a natural conversation, so it is intended to be fun.
  • Recording can be stopped (or paused) at any time without giving a reason.
  • There is no right or wrong answer.
  • There is no time limit.

Some people may feel uncomfortable stopping the study, even if they are feeling distressed. A family member or carer should witness the task for this reason (usually the same witness that witnessed the consent stage). The witness can also stop or pause the recording at any point without giving a reason. As a researcher, it is crucial to understand the importance of this witness. Different cognitive impairments and even different people with the same cognitive impairment have distinct signals to indicate distress. Family members and carers are significantly more experienced at identifying whether a particular person is uncomfortable, than any researcher, because they know that exact person.

Location is another very important matter to consider on the topic of participant comfort.

An Alzheimer Scotland cafe in a resource centre, designed to be an accessible community hub. This is one of the most fascinating rooms I have been in. It looks and feels like a stylish modern cafe but every single item is thoroughly thought through and designed to be accessible! A perfect example of inclusive design.

A suitable location is relatively easy to find as spontaneous conversations can take place almost anywhere. For participant comfort and availability of a witness however, it is best to collaborate with a business or charity focused on the cognitive impairment of interest. To engage with their communities, these organisations usually have drop-in centres that people can visit for social activities, support, and classes (for example, the picture above).

These centres are perfect locations to run tasks as people are very comfortable in them. The staff also know the potential participants and can therefore be witnesses and help with recruitment (discussed below).

Most accessible locations are suitable but working with a charity, to carry out the study in one of their centres, is the best option for participant comfort when possible.

Participant Recruitment

Collaborating with a relevant organisation is vital when recruiting vulnerable participants, this is in addition to the benefits around participant comfort. These organisations can reach out to their community and assist with recruitment of suitable participants in a safe and friendly manner. Collaboration costs the organisation valuable time however, so it is important to explain the motivations behind the data collection and how it aims to benefit people.

The healthy participants also need to be recruited in the correct way. It is common to compensate research participants with small rewards, such as gift cards, but it is not advised in this case. People with cognitive impairments will be using their time to contribute to research by taking part in the task. The healthy participant will ideally be motivated by the contribution to society and not some end reward.

Someone who does not care about the motivations behind the research could rush through the task for a gift card, devaluing the vulnerable participants time. This example case can not happen if there is no monetary reward offered for taking part.

Optional Cognitive Assessments

Collecting multi-modal recordings of these conversations is extremely time-consuming. It is therefore worthwhile to share this data (which we will do so as detailed in the data security section).

For use by other researchers in certain fields, such as Psychology, cognitive assessment results have huge benefits. Another corpus (for example, spatial navigation or memory based studies) that performed the same cognitive assessment could be merged to reveal unknown connections.

There are also worries to consider before including such a task however. For example, one task that is suitable for our data collection is the Addenbrooke’s Cognitive Examination (ACE-III) as it is commonly used, low-tech, and quick to perform. Here is an example question:

An example question from the ACE-III

In the UK, NHS training needs to be passed in order to run this test and similarly with other tests, all training must be completed prior to collection. This section is not to be recorded audio-visually, only the conversation task should be captured.

One downside to highlight is that the ACE-III is used by GPs to screen people while diagnosing dementia. Therefore, participants may recall doing this task and be reminded of the stressful times around their diagnosis. This could upset a participant and in addition, retaking the test may highlight how they have declined in cognitive performance since first completing it.

Each cognitive impairment has a range of tests to scrutinise and it is very valuable to include a cognitive assessment. A person’s well-being should be prioritised however, so only run tests after careful consideration and the relevant training.

Data Security

The task is chosen, participants recruited and understand the study, a comfortable location is set, witnesses ready, consent process complete, optional cognitive assessments over and now it is finally time to press record.

Securely Recording Conversations

In general, conversations involving vulnerable people or medical personnel are full of sensitive data. People often disclose personally identifiable information during a conversation (for example, mentioning their children’s names or medical history). This concern is even stronger with conversations involving people with cognitive impairments, less prone to control the information they disclose.

Standard recording systems (e.g. audio recorders and video cameras) are not secure devices, and they cannot be used to capture sensitive data. Furthermore, recorded data can easily be accessed on standard systems so they cannot be used to store sensitive data. Ethical and legal consequences of data breach must be accounted for if a standard device is lost or stolen, highlighting the need for a secure approach.

A new system - “CUSCO” - was developed by Pierre Albert to satisfy the requirements stemming from the ethical assessment regarding data collection of sensitive material.

The device ensures the security of the recorded data by encrypting recorded streams in real time and it allows the collection of a range of modalities, including audio and video. The data is encrypted using Veracrypt, a dedicated open-source software that underwent a security audit, vouching for the correct implementation of the encryption algorithms. Collected data can only be accessed with the key generated for each project, ensuring security of the corpus during all the phases of its life: collection, transport, exchange and storage.

The CUSCO device is developed with a much more general purpose in mind, capturing any sensitive in-situ conversation (for example a GP consultation).

Risk prevention and mitigation are therefore fundamental so even if the device is compromised or stolen during recording, the entire dataset of previously recorded conversations and any recording in progress are secure.

The software of the device itself is organised around a modular design:

Main components of the CUSCO device.

Each stream, corresponding to a modality (video, audio, 3D) or a function (Voice Activity Detection) is controlled by a dedicated module in charge of setting the configuration, checking the state of required elements (presence of the appropriate device), and managing the recording.

For our use-case, analysing multi-modal conversations, we are using two depth cameras, a high-quality table microphone, and a microphone array to facilitate speaker diarisation in post-processing (segmentation of the audio and attribution to each speaker).

Here is an early trial-run of our setup:

Myself and Arash Eshghi trialling the task with the CUSCO device (kindly highlighted by Pierre Albert).

We are using a table microphone because lapel microphones need to be attached to the participants, which can be invasive and has the potential to cause distress.

The hardware of the device uses common off-the-shelf elements, while the software is open-source (here are the design schemes and software). The need for such a device extends beyond the conversations that we detail in this paper to any sensitive recordings that should be encrypted live. Such use cases include recordings of: GP consultations, interactions with children, and discussions with private companies. In stricter data collections where even the researchers are not allowed access to the raw data, the device provides capabilities for the collection of anonymised audio and visual features. The researcher could therefore only access an abstracted indirect description of the interaction that cannot be used to reconstruct the original signal.

Securely Handling and Sharing Data

Once the conversations have been recorded securely, they remain encrypted on CUSCO (detailed above) and can therefore be transported.

The research team then need to remove any personal information that may have been disclosed during the conversations. To do this, the audio is silenced and the video blurred (around the mouth) whenever sensitive information is uttered. Blurring video reduces the accuracy of visual behaviour annotation but privacy takes precedence to avoid possible participant identification. The transcription can therefore not contain any sensitive information (and should not be transcribed from the original recordings to ensure this). Personal information will not be shared and it is important to highlight this in the participant information sheet (PIS). These processed recordings are now considered anonymous as the participants are only identifiable by personal contacts (thus, an unknown researcher cannot identify the participant).

The contact details of a member in the research team should be included in the PIS to allow the request for deletion, and subsequent removal, of a participant’s data. The anonymised recordings and associated transcriptions can be shared with other relevant researchers through centralised archives to control its use, if stated in the PIS, and results published in research papers.

For example, we have decided to store our corpus in DementiaBank as it is a shared database of multimedia interactions for the study of communication in dementia. Access to the data in DementiaBank is password protected and restricted to members of the DementiaBank consortium group. Researchers that would benefit from access to this data can request to join this group (after a vetting process) and therefore benefit from the corpus.

Conclusion

Collecting these conversations from people with cognitive impairments is a vital step towards creating more accessible and natural voice assistants. To ensure this is done ethically, there are many factors that need to be considered and I hope this practical ethical guide can assist researchers who want to navigate the many ethical challenges in order to collect and release similar datasets. CUSCO can also be used to securely capture, transport and exchange this data when needed.

It took me over a year to get to the point of collection (then delayed by coronavirus) but I hope this guide helps the next researchers capture similar conversations significantly faster, without sacrificing ethical considerations or data security. Accessibility in conversational AI will hopefully boom in popularity and I really hope this helps!

Acknowledgements

This work was sponsored by Wallscope and The Data Lab.

Thank you to Alzheimer Scotland for spending time with me, supporting this project and overall being fantastic! Even though this has been postponed due to Covid, I will come to complete this study when it is safe to do so and your time will not have been wasted.

Another huge thank you to all the people I spoke to over the past year about this project. There is a countless (figuratively) amount of you! I noted every piece of advice you said for this guide and I’m sure we’ll all be happy if this helps just one other person :)

Click for all articles by Angus Addlesee

--

--

Research Associate at Heriot-Watt University. Studying a PhD in Artificial Intelligence. Contact details at http://addlesee.co.uk/