By Genevieve Wanucha
As published in Dimensions Magazine - Spring 2019
Ten years ago, Jim Schwoebel got a memorable phone call from his mother. She was crying. His brother had been hospitalized after a psychotic break at the age of twenty. Looking back, there had been warning signs: his brother had gone to his primary care physician eleven times complaining of headaches and unclear thoughts. No psychiatric specialist was able to flag his true problem. During that time, Schwoebel took to exploring his collection of eight hundred voicemails from his brother, saved up over the years. As a biomedical engineer and future CEO of NeuroLex Laboratories, Schwoebel wondered if this lengthy verbal record might hold any clues to the speaker’s developing schizophrenia. Could there have been a way to catch this condition in one of those eleven clinic visits, from just subtle changes in his voice?
In fact, researchers were just about to find the answer. Starting in 2007, a team from Columbia University and IBM recruited a group of 34 adolescents at a high risk of psychosis and followed them over three years. Five of the people experienced a psychotic episode. Early in the study, the researchers had recorded voice samples from the participants. They analyzed all the sound clips using voice analysis technology, designed to detect the changes in word usage known to emerge in early psychosis. In 2015, the team reported that they could predict which people developed psychosis using only voice with 100% accuracy. Although based on a small sample size, this result far outperformed pen and pencil interviews. This use of voice as a biomarker of disease detection is exactly the kind of tool that could help people like Schwoebel’s brother.
A wealth of information about health is encoded in our speech. “Personally, I’m fascinated by voice because it is such a complex task,” says UW Medicine’s Dr. Reza Hosseini Ghomi, MD, Director of the UW DigiPsych Lab and clinician in the UW Memory and Brain Wellness Center. “Think about how much precision and coordination of muscles and brain regions are involved to produce voice, and various diseases can subtly or acutely affect one’s voice and use of language.” Early on, emerging disorders may make your voice wobble so mildly that it’s not detectable to the human ear. In Parkinson’s disease, speech changes include loss of volume, speaking in short rushes, sounding more monotone. Different changes come about during different stages of illness. In developing schizophrenia, words in sentences become less related to each other; while in Alzheimer’s disease, there’s long pauses between words, trouble finding words, and use of more pronouns, such as “it,” “that,” “them,” instead of specific nouns and names.
Now, researchers are applying machine learning-based voice recognition technology, such as that developed for Amazon’s voice home-assistant, Alexa, to identify voice patterns that are specific to different neurological diseases. And, as with Alexa, people can give voice samples from the comfort of their home. The biotech companies are working to bring to market technology to monitor a patient’s health in the clinic or remotely, using smartphone apps or other wearables, with samples of voice. The hope is to use voice data to create non-invasive, inexpensive ways to track changes in symptoms and response to medication, tools called voice biomarkers. What could be simpler than telling an app what you had for breakfast?
There is no FDA-approved digital voice biomarker technology currently on the market for clinical purposes, mostly because the field is so new and needs more data. “The field of digital biomarkers is still very fragmented because there are no standards for voice recording or an organizing force,” says Dr. Hosseini Ghomi, who is also Chief Medical Officer at NeuroLex Laboratories, working alongside Jim Schwoebel to create voice biomarker technology. “There isn’t a national digital biomarker association like there is an Alzheimer’s Association, for example, so we need to bring researchers and industry stakeholders together.” In a sign of the field’s momentum, its very first academic journal, Digital Biomarkers of Karger Journal, launched in 2017 to provide a dedicated home for all digital biomarker work. Voice is one form of digital biomarker under investigation, along with finger tapping speed, sleep movements, and walking
Digital biomarker technology, in general, provides clinicians with new avenues to capture symptoms and functional changes in everyday life. “So often our patients describe symptoms that we do not directly observe in the clinic,” says Dr. Carolyn Parsey, PhD, a neuropsychologist at the UW Memory and Brain Wellness Center. “Real-time data capture, such as through wearable sensors and home-related technologies, collect this data so we can understand what goes on in a typical day. Voice analysis is just one more way that we can capture changes, perhaps before they are noticeable enough to come into the clinic. This means earlier diagnosis, earlier intervention, and better outcomes for patients and caregivers alike.”
For clinicians, voice biomarkers could help solve intractable issues in the care and management of neurological disorders, which are hard to address in a 30-minute clinic appointment. The first goal is improved disease management at home. “I think voice biomarkers have the potential to offer something revolutionary in terms of accessibility and level of improvement for the patient,” says Dr. Hosseini Ghomi. “Voice biomarkers are non-invasive, affordable, and can be used at home.” As a clinician treating patients from the 5 WWAMI states in the northwest, he desperately needs a way to monitor his patients in places like rural Montana, who may not be able to travel to the specialty clinic without hardship. Voice biomarkers could help him modify and customize treatment plans, adjust doses of medication, and recommend injury prevention interventions, all from hundreds of miles away.
At the DigiPsych Lab and NeuroLex Laboratories, Dr. Hosseini Ghomi has been working to develop machine learning models that can translate voice data into a diagnostic tool for neurological diseases, so far, focused in Parkinson’s disease. He currently uses voice samples from a mobile observational study conducted by Sage Bionetworks. The study, started in 2015 and still going, is conducted purely through an iPhone app, mPower, which collects health, motion, and voice data on users with and without Parkinson’s disease. People can download the app and consent for the research. Prompted by text message every day, participants complete activities such as cognitive games, finger tapping, walking exercises, and saying ‘ahhh’. The mPower study data, consisting of survey responses and mobile sensor measurements, is available for any researcher to use and analyze in studies.
Recently, Dr. Hosseini Ghomi and his team used 65,000 voice samples from about 6,000 people in the mPower study. These were 10-second files of people saying ‘ahhhh.’ After removing background noise, the team passed the raw audio through two algorithms to extract the frequencies and acoustic features—terms familiar to musicians, such as pitch, jitter, shimmer, prosody, loudness, and harmonics. They then fed the audio into their own machine learning models, which are tools to extract meaningful patterns out of massive amounts of busy, raw data.
The team was able to tell between people in the Parkinson’s disease group from the control group, 85% of the time. Their models performed better than the 74% average accuracy of clinical diagnosis from non-specialist doctors and the 80% average accuracy of movement disorder specialists. “So that’s exactly what we’re trying to get—A dynamic, frequency-based voice biomarker for Parkinson’s disease,” says Dr. Hosseini Ghomi. Because the models performed so accurately with such short, simple audio clips, he believes that even denser data sets with spoken words could provide a superior voice biomarker. The findings were published in the IEEE Xplore Digital Library and presented at the 2018 IEEE Signal Processing in Medicine and Biology Symposium. The group has recently submitted work demonstrating the ability to detect differences in disease severity and even depression symptoms in patients with Parkinson’s disease using only voice.
Dr. Hosseini Ghomi ultimately wants to create FDA-approved voice biomarkers that are truly specific to a disease process—meaning that they can reliably tell the difference between early Parkinson’s, Alzheimer’s, ALS, and frontotemporal degeneration, and help confirm diagnoses. First, researchers need to deeply understand how the vocal changes in these disorders depart from healthy people’s voices.
To begin, they need a lot more voice samples. A whole lot more. “As a field, we have to get together and pool our data,” says Dr. Hosseini Ghomi. “We also need a uniform standard of collecting voice samples.” Currently, the field’s projects use different file types, recorders, microphones, and quality levels. Going further, he advocates for a national effort that builds off of the existing biomarker collection and use for research. “We need a national voice sample repository.” For example, NIH Alzheimer’s Disease Research Centers could collect voice samples from participants in the longitudinal cohorts, in addition to the brain scans, blood, and spinal fluid.
For all of the simplicity and ease of voice biomarker technology, the field faces complicated hurdles, such as the lack of a national data resource and some privacy concerns. Beyond clips of people saying ‘ahhhh,’ voice samples from commercial technology are impossible to completely de-identify (protect a person’s identity from being connected to their information). Not all patients will want to give voice samples that may contain personal information or let apps have access to their daily activities, however the mPower study has shown a high rate of participant consent for researchers to use their data collected on the Parkinson’s disease smartphone app. “These are signs that our patients and people around the country stand together to help find solutions for neurodegenerative diseases,” says Dr. Hosseini Ghomi.
For now, Dr. Hosseini Ghomi and his fellow clinicians of the UW Memory and Brain Wellness Center hope that research institutions will collaborate with industry to create a national database of voice samples—the terabyte version of those eight hundred voicemails from Jim Schwoebel’s brother. At least, they could sound it out. •