AI can predict those at risk of psychosis with 93% accuracy by examining speech


A machine-learning method discovered a hidden clue in people’s language predictive of the later emergence of psychosis — the frequent use of words associated with sound.

A paper published by the journal npj Schizophrenia published the findings by scientists at Emory University and Harvard University.

The researchers also developed a new machine-learning method to more precisely quantify the semantic richness of people’s conversational language, a known indicator for psychosis.

Their results show that automated analysis of the two language variables — more frequent use of words associated with sound and speaking with low semantic density, or vagueness — can predict whether an at-risk person will later develop psychosis with 93 percent accuracy.

Even trained clinicians had not noticed how people at risk for psychosis use more words associated with sound than the average, although abnormal auditory perception is a pre-clinical symptom.

“Trying to hear these subtleties in conversations with people is like trying to see microscopic germs with your eyes,” says Neguine Rezaii, first author of the paper.

“The automated technique we’ve developed is a really sensitive tool to detect these hidden patterns.

It’s like a microscope for warning signs of psychosis.”

Rezaii began work on the paper while she was a resident at Emory School of Medicine’s Department of Psychiatry and Behavioral Sciences. She is now a fellow at Harvard Medical School’s Department of Neurology.

“It was previously known that subtle features of future psychosis are present in people’s language, but we’ve used machine learning to actually uncover hidden details about those features,” says senior author Phillip Wolff, a professor of psychology at Emory. Wolff’s lab focuses on language semantics and machine learning to predict decision-making and mental health.

“Our finding is novel and adds to the evidence showing the potential for using machine learning to identify linguistic abnormalities associated with mental illness,” says co-author Elaine Walker, an Emory professor of psychology and neuroscience who researches how schizophrenia and other psychotic disorders develop.

The onset of schizophrenia and other psychotic disorders typically occurs in the early 20s, with warning signs — known as prodromal syndrome — beginning around age 17.

About 25 to 30 percent of youth who meet criteria for a prodromal syndrome will develop schizophrenia or another psychotic disorder.

Using structured interviews and cognitive tests, trained clinicians can predict psychosis with about 80 percent accuracy in those with a prodromal syndrome.

Machine-learning research is among the many ongoing efforts to streamline diagnostic methods, identify new variables, and improve the accuracy of predictions.

Currently, there is no cure for psychosis.

“If we can identify individuals who are at risk earlier and use preventive interventions, we might be able to reverse the deficits,” Walker says.

“There are good data showing that treatments like cognitive-behavioral therapy can delay onset, and perhaps even reduce the occurrence of psychosis.”

For the current paper, the researchers first used machine learning to establish “norms” for conversational language.

They fed a computer software program the online conversations of 30,000 users of Reddit, a social media platform where people have informal discussions about a range of topics.

The software program, known as Word2Vec, uses an algorithm to change individual words to vectors, assigning each one a location in a semantic space based on its meaning.

Those with similar meanings are positioned closer together than those with far different

Risultati immagini per Word2Vec


The Wolff lab also developed a computer program to perform what the researchers dubbed “vector unpacking,” or analysis of the semantic density of word usage.

Previous work has measured semantic coherence between sentences.

Vector unpacking allowed the researchers to quantify how much information was packed into each sentence.

After generating a baseline of “normal” data, the researchers applied the same techniques to diagnostic interviews of 40 participants that had been conducted by trained clinicians, as part of the multi-site North American Prodrome Longitudinal Study (NAPLS), funded by the National Institutes of Health. NAPLS is focused on young people at clinical high risk for psychosis. Walker is the principal investigator for NAPLS at Emory, one of nine universities involved in the 14-year project.

This shows a plastic mask behind binary code

The onset of schizophrenia and other psychotic disorders typically occurs in the early 20s, with warning signs — known as prodromal syndrome — beginning around age 17. About 25 to 30 percent of youth who meet criteria for a prodromal syndrome will develop schizophrenia or another psychotic disorder. The image is in the public domain.

The automated analyses of the participant samples were then compared to the normal baseline sample and the longitudinal data on whether the participants converted to psychosis.

The results showed that higher than normal usage of words related to sound, combined with a higher rate of using words with similar meaning, meant that psychosis was likely on the horizon.

Strengths of the study include the simplicity of using just two variables — both of which have a strong theoretical foundation — the replication of the results in a holdout dataset, and the high accuracy of its predictions, at above 90 percent.

“In the clinical realm, we often lack precision,” Rezaii says.

“We need more quantified, objective ways to measure subtle variables, such as those hidden within language usage.”

Rezaii and Wolff are now gathering larger data sets and testing the application of their methods on a variety of neuropsychiatric diseases, including dementia.

“This research is interesting not just for its potential to reveal more about mental illness, but for understanding how the mind works — how it puts ideas together,” Wolff says. “Machine learning technology is advancing so rapidly that it’s giving us tools to data mine the human mind.”

Language offers a privileged view into the mind: it is the basis by which we infer others’ thought processes, such that disorganized language is considered to reflect disorder in thought.

Language disturbance is prevalent in schizophrenia and is related to functional disability, given that an individual needs to think and speak clearly in order to maintain friends and a job1.

In schizophrenia, the speaker “violates the syntactical and semantic conventions which govern language usage”, yielding reduction in syntactic complexity (concrete speech, poverty of content) and loss of semantic coherence, e.g. the disruption in flow of meaning in language (derailment, tangentiality)2.

This language disturbance is an early core feature of schizophrenia, evident in subtle form prior to initial psychosis onset, in cohorts of both familial3 and clinical4567 high‐risk youths, as assessed using clinical ratings.

Beyond clinical ratings, there has been an effort to characterize early subtle language disturbances in clinical high‐risk (CHR) individuals using linguistic analysis, with the aim of improving prediction.

Bearden et al8 applied manually coded linguistic analyses to brief speech transcripts in a CHR cohort, finding that both semantic features (illogical thinking) and reduction in syntactic complexity (poverty of speech) predicted psychosis onset with an accuracy of 71%, as compared with 35% accuracy for clinical ratings.

Psychosis onset was also predicted by reduced referential cohesion, such that the use of pronouns and comparatives (“this” or “that”) frequently did not clearly indicate who or what was previously described.

While this manual linguistic approach appears to be superior to clinical ratings in psychosis prediction, it depends on predefined measures that may not capture other subtle language features.

Therefore, we have used automated natural language processing methods to analyze speech in CHR cohorts.

These are probabilistic linguistic analyses based on the computer’s acquisition of vocabulary (semantics) and learning of grammar (syntax) through machine‐learning algorithms trained on very large bodies of text, enabled by exponential increases in computing power, and the flood of text that arrived with the Internet.

For semantics, a common approach is latent semantic analysis, in which a word’s meaning is learned based on its co‐occurrence with other words, inspired by theories of vocabulary acquisition910.

In this analysis, each word is assigned a multi‐dimensional semantic vector, such that the cosine between word‐vectors represents the semantic similarity between words. Grouping of successive word‐vectors can be used to estimate the semantic coherence of a narrative.

Latent semantic analysis has been applied to speech in schizophrenia, finding an association of decreased semantic coherence with clinical ratings of thought disorder and functional impairment, and with abnormal task‐related activation in language circuits1112.

For syntax, part‐of‐speech tagging is used to determine sentence length and rates of usage of different parts of speech1314.

In an earlier proof‐of‐principle study in a narrative‐based protocol with a small CHR cohort, we used both latent semantic analysis and part‐of‐speech tagging, with machine learning, to identify a classifier of psychosis that comprised minimum semantic coherence, shortened sentence length, and a decrease in the use of determiner pronouns (e.g., “that” or “which”) to introduce dependent clauses15.

These three features were correlated with but outperformed clinical ratings in prediction of psychosis.

Funding: The work was supported by grants from the National Institutes of Health and a Google Research Award.

Emory Health Sciences
Media Contacts: 
Carol Clark – Emory Health Sciences
Image Source:
The image is in the public domain.

Original Research: Open access
“A machine learning approach to predicting psychosis using semantic density and latent content analysis”. Neguine Rezaii, Elaine Walker & Phillip Wolff.
npj Schizophrenia. doi:10.1038/s41537-019-0077-9


Please enter your comment!
Please enter your name here

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.