University of Massachusetts Amherst researchers have invented a portable surveillance device powered by machine learning – called FluSense – which can detect coughing and crowd size in real time, then analyze the data to directly monitor flu-like illnesses and influenza trends.
The FluSense creators say the new edge-computing platform, envisioned for use in hospitals, healthcare waiting rooms and larger public spaces, may expand the arsenal of health surveillance tools used to forecast seasonal flu and other viral respiratory outbreaks, such as the COVID-19 pandemic or SARS.
Models like these can be lifesavers by directly informing the public health response during a flu epidemic.
These data sources can help determine the timing for flu vaccine campaigns, potential travel restrictions, the allocation of medical supplies and more.
“This may allow us to predict flu trends in a much more accurate manner,” says co-author Tauhidur Rahman, assistant professor of computer and information sciences, who advises Ph.D. student and lead author Forsad Al Hossain.
Results of their FluSense study were published Wednesday in the Proceedings of the Association for Computing Machinery on Interactive, Mobile, Wearable and Ubiquitous Technologies.
To give their invention a real-world tryout, the FluSense inventors partnered with Dr. George Corey, executive director of University Health Services; biostatistician Nicholas Reich, director of the UMass-based CDC Influenza Forecasting Center of Excellence; and epidemiologist Andrew Lover, a vector-borne disease expert and assistant professor in the School of Public Health and Health Sciences.
The FluSense platform processes a low-cost microphone array and thermal imaging data with a Raspberry Pi and neural computing engine.
It stores no personally identifiable information, such as speech data or distinguishing images.
In Rahman’s Mosaic Lab, where computer scientists develop sensors to observe human health and behavior, the researchers first developed a lab-based cough model.
Then they trained the deep neural network classifier to draw bounding boxes on thermal images representing people, and then to count them.
“Our main goal was to build predictive models at the population level, not the individual level,” Rahman says.
They placed the FluSense devices, encased in a rectangular box about the size of a large dictionary, in four healthcare waiting rooms at UMass’s University Health Services clinic.
From December 2018 to July 2019, the FluSense platform collected and analyzed more than 350,000 thermal images and 21 million non-speech audio samples from the public waiting areas.
The researchers found that FluSense was able to accurately predict daily illness rates at the university clinic. Multiple and complementary sets of FluSense signals “strongly correlated” with laboratory-based testing for flu-like illnesses and influenza itself.
According to the study, “the early symptom-related information captured by FluSense could provide valuable additional and complementary information to current influenza prediction efforts,” such as the FluSight Network, which is a multidisciplinary consortium of flu forecasting teams, including the Reich Lab at UMass Amherst.
The FluSense device houses these components. The image is credited to UMass Amherst.
“I’ve been interested in non-speech body sounds for a long time,” Rahman says.
“I thought if we could capture coughing or sneezing sounds from public spaces where a lot of people naturally congregate, we could utilize this information as a new source of data for predicting epidemiologic trends.”
Al Hossain says FluSense is an example of the power of combining artificial intelligence with edge computing, the frontier-pushing trend that enables data to be gathered and analyzed right at the data’s source.
“We are trying to bring machine-learning systems to the edge,” Al Hossain says, pointing to the compact components inside the FluSense device.
“All of the processing happens right here. These systems are becoming cheaper and more powerful.”
The next step is to test FluSense in other public areas and geographic locations.
“We have the initial validation that the coughing indeed has a correlation with influenza-related illness,” Lover says. “Now we want to validate it beyond this specific hospital setting and show that we can generalize across locations.”
Climate warming trends have been accelerating over the last few decades. The world’s nine warmest years in the time period from 1850 to 2017 have all occurred in the last twelve years, with a total increase of approximately 0.97°C in the average annual air temperature for the time period from 1880 to 2017 (1).
This ostensibly small increase in average global temperature is nevertheless responsible for significant changes in the worldwide weather patterns and associated effects on society through sea level rise (and associated erosion) and increased frequency and intensity of flooding, droughts (with associated wildfires and crop failures) and freezing rain events (2).
Of particular importance to Canada, climate warming is even more acute at higher latitudes and in the winter months (3). Over the past 70 years, the overall annual average temperature in Canada has increased by 1.8°C (4), with an average winter temperature increase of 3.4°C (4).
In some areas in the northwest, this increase has been even higher. Because climate change affects not only temperatures but precipitation patterns, Canada is experiencing generally drier conditions in the west and above average precipitation in the east (4).
Climate-driven changes to temperature and precipitation are known to affect the risk of infectious disease transmission. Climate change is modifying range distributions of disease vectors (i.e. ticks and mosquitoes) and reservoir populations (i.e. birds, rodents and deer) that participate in the transmission of pathogens from ticks and mosquitoes to humans as climate suitability for vector and reservoir populations change (5,6).
For example, the increase in cases of Lyme disease in Canada reflect the northward expansion of the range of the black-legged tick vector, Ixodes scapularis, in the United States (US) and into southern Canada, as climate change has made Canada more conducive to establishing tick populations (7,8).
This expansion of the area where the vectors and their reservoirs can thrive means not only an increased risk of sporadic infectious disease but also an increased likelihood that these vectors, and the diseases that they carry, can become endemic (6,9–11).
In addition, climate change is influencing the mobility patterns of people and goods. An increase in “climate refugees”, people displaced when their lives and/or livelihoods are at risk from extreme weather events, is expected (11).
Refugees, often from geographical areas where infectious diseases are more common and with different vaccination schedules and practices, may inadvertently bring these diseases into Canada (12).
Tourism is also affected by climate change, as changes in both home and travel destinations influence the push and pull of factors motivating people to travel and the potential for disease spread (13–15).
Vectors and pathogens can inadvertently be transported through shipments by air, land and sea (16–18). Land and sea containers are known to support the invasion of mosquitoes because larva can develop in trapped standing water, and if no water exists, eggs can withstand desiccation for weeks to months (19,20).
Air travel has also been responsible for travellers carrying infections into new areas. In Canada, returning travellers have brought with them the Zika virus and have also sparked an outbreak of severe acute respiratory syndrome (SARS) coronavirus (15,21,22).
Thus, the increased risks of infectious diseases with climate change pose important public health risks and work is underway to monitor, assess and predict the impact of these risks.
In the past, public health management has depended on notifiable disease reporting surveillance systems to detect outbreaks, monitor disease progression and inform prevention and mitigation policies. However, traditional surveillance systems are typically characterized by delays in the reporting and analysis of the data and the communication of the results.
To address the need for closer to real-time surveillance of emerging issues and earlier insight on potential health impacts, two risk assessment strategies have been, and are being, developed: event-based surveillance (EBS) systems, which increasingly incorporate artificial intelligence; and risk modelling.
The objective of this overview is to describe these two risk assessment strategies and how they can inform public health actions to prevent, detect and mitigate the climate change increases in infectious diseases.
Event-based surveillance systems
Event-based surveillance systems use a variety of open-source internet data and assessment techniques to identify disease threats (23,24). Typical open-source internet data include online newswires, social media and other internet data streams, in multiple languages, to detect early-warning signals of threats to public health.
These systems have proven to be more timely in comparison with conventional surveillance data sources from laboratory results or hospitals (25), and can be used in conjunction with conventional surveillance systems to enhance early warning of public health threats (26).
The more quickly signals from an evolving outbreak are identified, the more quickly the outbreak can be tracked and a public health response can be planned and implemented (27).
There are three types of EBS systems: moderated; partially moderated; and fully automated (28). The level of automation influences how the information flow in EBS systems is managed from the open-source internet data from news aggregators (e.g. Factiva, Google News, Moreover Baidu), Rich site summary (RSS) and social media feeds from official and unofficial sources (e.g. Twitter for US Centers for Disease Control and general public), and validated official reports (e.g. World Health Organization, US Centers for Disease Control).
The Program for Monitoring Emerging Diseases (ProMED) is an example of a moderated system and was on the forefront of EBS development over 25 years ago (29,30). ProMED is run by volunteer analysts (who are expert curators) who monitor and choose news articles, validate the content and notify subscribers of noteworthy infectious disease events.
Strengths of this system include having a low signal-to-noise ratio, being open access and having a broad reach. However, volunteers do not cover all populations at risk, volunteer biases can influence the moderation of events and volunteers do not have the resources (nor are they expected) to provide detailed information giving situational awareness for assessing the threat level (29).
The Global Public Health Intelligence Network (GPHIN) is a partially moderated system that was developed by the Government of Canada, in collaboration with the World Health Organization, four years after ProMED (31–33).
GPHIN access is restricted to agencies with health-related mandates. Artificial intelligence (AI) algorithms in GPHIN automate a stream of two to three thousand news articles per day that are moderated by 12 expert analysts who identify and issue alerts for threats using tacit contextual information (e.g. historic context, market trends, travel bans and climate anomalies).
An example of the usefulness of GPHIN dates back to early 2003 when analysts identified reports from China referring to increased sales of antiviral therapies just before the global onset of the SARS epidemic (34). Unlike ProMED, GPHIN benefits from multi-staged filtering using AI and trained analysts.
Artificial intelligence enables processing of a larger data stream, and analysts have the resources to provide information for situational awareness. Both ProMED and GPHIN can function in multiple languages; however, it is expensive for GPHIN to add in other languages because of the cost to hire analysts with language fluency (33).
Fully automated systems include the European Commission’s Medical Information System (MedISys), Pattern-based Understanding and Learning System (PULS) and HealthMap.
These systems are open to the public, but also have restricted access to serve the needs of health agencies such as private discussion forums, increased functionality and data processing of commercial sources (35,36).
Fully automated systems are faster at processing data and less expensive to operate than moderated systems. The main drawback is the higher signal-to-noise ratio meaning that there is an increased risk of identifying false threats (37,38). The EBS systems can be connected in synergistic ways to address this risk (39).
For example, MedISys uses low signal-to-noise ratio data from ProMED and GPHIN, and uses more advanced language processing algorithms from PULS. The PULS extracts information about events identified in the MedISys stream and then returns these data back to MedISys (36,40). The different types of EBS systems are summarized in Table 1.
Summary of some event-based surveillance systems
|Moderated systema||Program for Monitoring Emerging Disease (ProMED) (29,30)||In 1994 as a nonprofit organization||Yes|
|Partially moderated systemb||Global Public Health Intelligence Network (GPHIN) (31–33)||In 1998 through partnership between the Government of Canada and World Health Organization||No; available to partnered health agencies|
|Fully automated systemc||Medical Information System (MedISys) (36,41,42)||In 2004 by the European Commission||Yes|
|HealthMap (35,38,40,43)||In 2006 by Boston Children’s Hospital||Yes|
|Pattern-based Understanding and Learning System (PULS) (36,44,45)||In 2007 by the Department of Computer Science, University of Helsinki, Finland||Yes|
a A moderated system: volunteer expert-curators identify, review and validate sources and create the reports
b A partially-moderated system: automatically acquires, categorizes, and filters sources. Expert-curators moderate the subset of sources and create the reports
c A fully-automated system: automatically acquires, categorizes, filters and reports the health-related sources
Artificial intelligence applications
The ability of EBS systems to quickly and accurately detect threats (such as outbreaks of infectious diseases) has been revolutionized by artificial intelligence applications for data processing.
Open-source internet data are considered “unstructured” in the sense that news articles, blogs, tweets, etc., provide a narrative describing an event. The text, numbers and dates are not organized in a data model, such as a database, that can be used for automated event detection and risk modelling; therefore, open-source data must be processed to extract and structure information about what happened, where it happened, when it happened and to whom it happened.
The EBS systems use natural language processing (NLP) methods to process and understand event narratives (46–48). Natural language processing is a field of research dedicated to understanding human discourse (49).
Early methods include the sub-language approach, where rules and patterns are used to interpret and classify vocabulary, syntax and semantics of the unstructured narrative. The EBS systems have taxonomies of terms to match predefined terms and their synonyms to those found in the data sources.
Much like with a conventional literature search, taxonomic classification of narratives can identify health-related articles by searching for related terms (e.g. human influenza A synonyms include H1N1, swine flu, California flu, human influenza and influenza A) (50).
The sublanguage approach for identifying health-related data in EBS systems is effective but also has drawbacks. Taxonomies are not easily generalizable and must be developed for each disease being monitored and kept up-to-date as language evolves and new discoveries about diseases are made. In this light, NLP has established a strong foundation in using machine learning (ML) methods.
Machine learning is a subset of AI that uses algorithms, such as statistical models, to perform a specific task without using explicit instructions; instead, relying on patterns and inference. The EBS systems gather open-source internet data (feeds and web queries) and then filter these data through a combination of the sublanguage approach and ML methods, where the latter is used to perform more complex tasks for analysing syntax, semantics, morphology, pragmatics and discourse (51).
For example, ML methods can be used to determine the difference between non-health related articles (e.g. “Bieber fever” refers to avid supporters of Justin Bieber) and those discussing an infectious disease outbreak (43,51,52).
Machine learning methods can also be used to distinguish between ambiguities in dates and locations, such as past and present outbreaks in articles that discuss historical context (53,54). Novel applications for ML methods are also being developed, such as structuring disease case information into epidemiological line lists (a listing of individuals affected by the disease and related information; i.e. health status, sex, location, date of onset, hospitalized) that can be used in outbreak investigations and risk modelling (55).
Once the information from open-source internet data has been processed into a data model, the event can then be reviewed and reported, as appropriate; furthermore, additional data analytics can be performed to communicate the current and predicted impact of the health threat. A summary of information flow from data collection, processing, analytics and reporting for EBS systems is presented in Table 2.
Information flow from open-source internet data in event-based surveillance systems
|EBS||Data collection||Data processing||Data analytics||Reporting|
|Moderated systems||Human analysts search and identify open-source internet data for health-related concern||Human analysts review, filter and designate the threat level of the event||None||Reports on health-related threats are communicated through email and posted on EBS system website|
|Partially moderated and fully automated systems||Automated feed of open-source internet data||Taxonomic classification and ML algorithms filter and classify events based on their metadata (e.g. type of threat, location and date). ML algorithms score the level of relevancy. In partially moderated systems, highly scored data sources are curated by human analysts||Analytic techniques evolve with time and differ among EBS systems. Current techniques include the following: mapping of geo-tagged events; bar plots showing changes over time to keyword counts, number of identified articles and expected and observed number of disease cases; word clouds showing importance of keyword terms; alert notices given sudden increases to case counts, reliability of sources and/or number of unique sources||Reports on health-related threats are communicated through email and posted on EBS system website and notified to appropriate web application user communities|
Abbreviations: EBS, event-based surveillance; ML, machine learning
An important advancement for risk assessment is increasing the variety of data being used in modelling approaches. Risk modelling in the context of infectious diseases is the process of identifying and characterizing factors in individuals or populations that increase their vulnerability to contracting disease (e.g. age, proximity to outbreak).
Statistical inference is a well-grounded and informative risk modelling approach that includes regression analysis. This method is used to determine how risk factors (explanatory variables) are associated with the outcome of interest (e.g. number of reported cases). Regression models, and statistical inference in general, are developing to include information from open-source internet data.
An early example was the inclusion of search query engine data from Google Flu Trends as a predictor for the outcome of the number of reported physician visits for flu-like illnesses (56). The resulting model was then used to predict the number of seasonal influenza cases one to two weeks into the future; however, this approach was not as effective in predicting outbreaks outside of the traditional flu season because of associations being identified with search query trends not related to seasonal influenza (e.g. winter basketball season) (57).
Subsequent work improved the accuracy of predicting seasonal influenza flu trends by using additional sources of open-source data (e.g. Twitter) and expanding the regression method to benefit from ML algorithms that can find complex associations among the outcome and explanatory variables (58).
Furthermore, regression modelling for the risk of infection has improved by including, in addition to open-source internet data, additional explanatory variables (e.g. climate and meteorological data from satellite imagery) that account for the presence, movement and distribution of pathogens, vectors, reservoir populations and infected people (59,60).
For example, in China, the expected number of cases of hand, foot and mouth disease in children was best predicted by including data on weekly temperature and precipitation as well as data on hand, foot and mouth disease-related queries from the Chinese Baidu search engine (61).
Another dominant risk modelling approach is the use of compartmental models to mathematically simulate transmission dynamics of a population; that is, the flow of individuals among health states, such as susceptible (S), infectious (I) and recovered (R).
For example, SIR models require defining parameters for the infectious rate (or inversely, the infectious period) and the rate of infectious contacts. It is then possible to estimate if an infected population will become epidemic, and to characterize the prevalence of a disease over time.
The compartmental modelling approach has more recently developed to simulate transmission dynamics among multiple populations (meta-populations). This requires the inclusion of mobility data to define the rate of individuals moving among populations (62). Human mobility at a meta-population level can be considered as the movement of people in a connected network of cities and countries.
These data can be obtained from mobile phone call records and air traffic passenger volumes (63,64). Through meta-population modelling, it is possible to identify the travel routes through which pathogens may spread or be carried to Canada, as well as to determine the likelihood of these events (65,66).
For example, the Zika virus is estimated to have first appeared in Brazil between August 2013 and April 2014 by infected travellers entering the country at Rio de Janeiro, Brasilia, Fortaleza and/or Salvador; and this introduction was followed by epidemics in Haiti, Honduras, Venezuela and then Colombia (21).