As connected devices such as voice assistants, security cameras, and smart appliances grow in popularity, the homes and offices where they are installed become increasingly filled with a dense web of Wi-Fi signals.
A new study from University of Chicago and University of California, Santa Barbara researchers finds that external attackers can use inexpensive technology to turn these ambient signals into motion detectors, monitoring activity inside a building without being detected themselves.
With only a small, commercially available Wi-Fi receiver, an attacker from outside the target site can measure the strength of signals emitted from connected devices and monitor a site remotely for motion, sensing whether a room is occupied.
The research, led by leading UChicago computer scientists Heather Zheng and Ben Zhao, reveals the technique of these attacks as well as potential defenses.
“It’s what we call a silent surveillance attack,” said Zheng, a Neubauer Professor of Computer Science at the University of Chicago and expert on networking, security and wireless technologies.
“It’s not just about privacy, it’s more about physical security protection. By just listening to existing Wi-Fi signals, someone will be able to see through the wall and detect whether there’s activity or where there’s a human, even without knowing the location of the devices.
They can essentially do a monitoring surveillance of many locations. That’s very dangerous.”
The research builds upon earlier findings that exposed the ability to “see through walls” using Wi-Fi signals.
However, previous methods detected indoor activity by sending signals into the building and measuring how they are reflected back to a receiver, a method that would be easy to detect and defend against.
The new approach requires only “passive listening” to a building’s existing Wi-Fi signals, does not need to transmit any signals or break encryption, and grows more accurate when more connected devices are present, raising significant security concerns.
“The worrisome thing here is that the attacker has minimal cost, can stay silent without emitting any signal, and still be able to get information about you,” Zheng said.
Connected devices typically do not communicate with the internet directly, but do so by regularly transmitting signals to an access point, a hardware device such as a router. When a person walks nearby either device in this conversation, it changes the signal subtly, such that the perturbation can be detected by a nearby receiver “sniffing” the signal. That’s enough information for an observer to know if a person (or large animal, the researchers add) is in the room, with very high accuracy.
Because most building materials do not block the propagation of Wi-Fi signals, the receiver does not even need to be in the same room or building as the access point or connected devices to pick up these changes.
These Wi-Fi sniffers are available off the shelf and inexpensive, typically less than $20. They’re also small and unobtrusive, easy to hide near target locations, and passive—sending no signal that could be detected by the target.
The researchers also suggested different methods to block this surveillance technique. One protection would be to insulate buildings against Wi-Fi leakage; however, this would also prevent desirable signals, such as from cellular towers, from entering. Instead, they propose a simple technical method where access points emit a “cover signal” that mixes with signals from connected devices, producing false data that would confuse anyone sniffing for Wi-Fi signatures of motion.
“What the hacker will see is that there’s always people around, so essentially you are creating noise, and they can’t tell whether there is an actual person there or not,” Zheng said. “You can think about it as a privacy button on your access point; you click it on and sacrifice a little bit of the bandwidth, but it protects your privacy.”
Zheng hopes that router manufacturers will consider introducing this privacy feature in future models; some of those firms have announced new features that use a similar method for motion detection, marketed as a home security benefit. The UChicago research has already received attention from Technology Review, Business Insider and other tech publications, raising awareness of this new vulnerability.
The study also reflects a growing research area in the Department of Computer Science, examining issues around increasingly prevalent connected “Internet of Things” devices. The IoT Security and Privacy Group, which includes Zhao and Zheng and additional faculty members including Nick Feamster, Blase Ur, and Marshini Chetty, will investigate both the benefits and potential vulnerabilities of these technologies, and a new IoT Lab in the Center for Data and Computing provides devices for researchers and students to hack and study for research.
Making sense of crowd tracking data is far from trivial. Individuals can have unique movement behaviors, while some crowd-level characteristics can be maintained. It is even more difficult to make sense of this type of data when positions are approximate and detections are sparse.
To simplify tracking data, we can separate them into periods of stops and moves [1,2]. This is a fundamental step that can be used to answer many questions that would be intangible given the raw dataset. A few possible questions are: “What are the most interesting locations?”, “How many people are traveling in pairs or small groups?”
In this paper, we explore the problem of splitting an individual’s trace into periods of stops and movements. We concentrate on WiFi tracking, a specific implementation of mobility data collection using radio frequency signals, which takes advantage of the fact that WiFi devices are ubiquitous and always with us. We expect that the discussed methods can easily be implemented for other radio frequency technologies, such as Bluetooth.
The problem of distinguishing stops from movements is not new, as it has already been explored for GPS-tracking. When visualizing GPS datasets, stop periods appear as positions randomly placed around the stop location. Because of this, some algorithms that detect stops are based on clustering methods. We have identified three such methods in the existing literature: stay point detection [3,4], Cbsmot  and Dbsmot . These methods utilize different properties of a trace: direction, speed, and distance. Unlike GPS traces, WiFi traces are sparse and have a positional accuracy of about 100 m. GPS positioning reaches decimeter accuracy, and devices can record positions with a high frequency. These differences can translate into different performance of the algorithms on WiFi datasets.
For large-scale crowd monitoring, technologies based on the detection of smartphones prevail. The most popular ones make use of call records  and WiFi sensing . Using call records scales better, but the data are sparser because detections are recorded only when a person makes a phone call or sends an SMS. The positioning accuracy is in the order of kilometers, the range of the GSM tower. In contrast, WiFi has a limited communication range, of about 100 m. WiFi-enabled smartphones also transmit more, as they try to connect to different networks or as installed apps try to communicate over the network, increasing their chances of being detected.
WiFi tracking is performed by using a set of sensors deployed across the detection area. The sensors are simple WiFi-enabled devices, such as WiFi routers, specifically configured to record detections of WiFi devices (smartphones). The sensors listen for all WiFi frames and extract the MAC address from them. The time, hashed MAC address (for privacy reasons), and the position of the sensor form a tuple that describes a detection. By having a set of detections for one device, we can trace its path through the area covered by the sensors.
When using WiFi tracking, the tracked objects do not participate actively. WiFi devices are configured by default to send WiFi frames regularly, exposing the device address and location. The unobtrusiveness of the method and the pervasiveness of smartphones make WiFi tracking scalable and inexpensive to deploy. Unfortunately, the sparsity and low accuracy of WiFi tracking datasets get even worse when we are dealing with large crowds. The human body blocks the electromagnetic signal, and signals from multiple devices interfere with each other. WiFi devices make use of low-level collision avoidance techniques to communicate even under high interference. However, sensors are not part of the normal WiFi communication and cannot make use of these techniques.
In the next section, we go into detail on what the limits of WiFi tracking are and how the sparsity and low accuracy positions affect the analysis. Furthermore, we will better describe what some of causes of the sparsity of data are.
To our knowledge, we are the first to measure the accuracy of stop/move partition algorithms on datasets constructed using RF-signals, particularly WiFi. Previous work has analyzed WiFi traces , but for specific contexts, for datasets gathered indoors, where they could make use of the received signal strength indicator (RSSI) to perform trilateration and improve the accuracy of positioning. Performing WiFi tracking in outdoor settings, we have realized that in complex situations (such as the presence of large crowds), RSSI values are too erratic and devices are rarely simultaneously detected by enough sensors to be able to apply trilateration. Furthermore, we prove that for the types of datasets that we explore, it could be impossible to achieve perfect accuracy in determining stops and moves. Even with knowledge of the correct labeling, the sparsity of the data does not permit partitioning the trace so that each second is labeled correctly.
Lastly, we bring improvements to the most promising of the three methods, stay point detection. These improvements make use of the tracking dataset to build a better estimate of the closeness of sensors, as opposed to the Euclidean distance. As an advantage, without the Euclidean distance, the location of sensors is not needed, easing the deployment of tracking platforms, where for example routers acting as sensors are already deployed and their location is not correctly recorded.Go to:
WiFi Tracking Datasets and Their Limitations
A WiFi tracking platform consists of several sensors. The sensors’ locations are known and are used to estimate the position of detected devices. These platforms make use of the assumption that most people carry a WiFi-enabled device such as a smartphone.
Building new WiFi sensing platforms is inexpensive. There are multiple, inexpensive options for devices with WiFi that can act as sensors. Furthermore, WiFi networks with multiple WiFi access points or WiFi routers can easily be configured to act as a WiFi tracking platform.
The sensors are set to listen for all WiFi frames (as defined by the IEEE 802.11 standard) and to record the time, encrypted address and the sensors’ own positions. The tuple <time, address, position> represents a detection, and the set of all detections, from all sensors, for a device represents the device’s trace.
Not all frames sent by devices are captured. For a frame to be captured, the device needs to be close enough to the sensor so that the signal is strong enough when it reaches it. The ideal range for WiFi is 100 m, but it is reduced because of obstructions or it is extended due to tunneling effects. Only frames that are complete and correctly decoded can be recorded; faulty frames are dropped by the network module or the operating system. Interference from other WiFi devices broadcasting at the same time or noise from the environment disrupts the WiFi frames and makes it impossible for them to be correctly captured.
The range at which WiFi signals are detectable is determined by the transmission power and the antenna. There is a high variation between devices based on manufacturer, impurities in the metals that form the antenna, and the software running on the devices. In some cases, the frequency can even be affected by the battery level, installed applications or the screen status of the mobile device. This has been shown in the work by the authors of . In contrast, GPS tracking can be done with a frequency that is constant and high.
Our own experiments and a review of the literature revealed that for indoor scenarios, the frequency with which frames can be correctly detected increases. For 90% of the data acquired from indoor WiFi tracking , the detection rate was less than one second. In contrast, for our outdoor experiment, with a high number of people, only 20% of the data had an inter-detection time of less than one second.
We performed a large-scale experiment in the city of Assen, The Netherlands, in 2016. The data were gathered during a festival that attracted more than 150,000 tourists. With such a large crowd, it was expected that WiFi quality would drop. Part of this drop is caused by an abundance of control frames, which use up a large portion of the available bandwidth.
One example from our dataset shows how difficult it can be to make sense of WiFi tracking data. Take a person that is positioned between three sensors, sitting there for a long time. The device carried by this person is detected by the three sensors, but almost never at the same time. The time between detections is in the order of minutes, as can be observed in Figure 1a. This makes it impossible to apply trilateration. What is worse is that at some point, the device is detected by a fourth sensor. If we trace the path based solely on detections, we obtain the one from Figure 1b. The device, although most likely static, appears to be moving chaotically. Most of our dataset has devices that exhibit this type of behavior.
We have discovered that throughout the dataset, less than 4% of detections represent ones that have triggered at least two sensors simultaneously and less than 0.5% have the same sequence number. This means that 3.5%, although detected simultaneously, represent detections of different frames. This is possible because multiple frames are sent every second, and our system was configured to record the time with a precision of one second.
Applying trilateration on the part of the data with simultaneous detections is not feasible. To apply trilateration, we need an estimate of the distance between each sensor and the device. In theory, the RSSI decreases as the distance between the sensor and tracked device increases. However, in our dataset, RSSI values have a high variation. RSSI has values between −80 and −20, and for devices that we determined to be static (through visual analysis and use of the manufacturer identifier), it has a mean standard deviation of nine. We expected these devices to almost always be detected with the same RSSI. Even worse, transmission power is different for each device, meaning we cannot compare values from different devices.
RSSI quality is dependent on the environment. The authors of  managed to use RSSI and trilateration to determine how much time people spend together in a dining hall. Because there were only a few individuals and the experiment took place indoors, the data were less noisy. The authors had 95% of detections at less than two minutes between them, while we have only 83% of detections with a gap smaller than two minutes. Although the difference does not seem significant, gaps add up when considering the sparsity of the data.
We analyzed the source of sparsity from WiFi-tracking datasets with a simple experiment. In one of our offices, we set up two WiFi sensors and configured them to record all the frames they received. The sensors were placed on a table, 50 cm apart.
The two sensors recorded frames for one hour, on the same WiFi channel. Figure 2a,b represents the detections at Sensors 1 and 2, respectively. Each dot represents a detection; the Ox axis represents time; and Oy represents the device that transmitted the frame. We identify the devices based on the MAC address inside the frame. It is possible for MAC addresses to be randomly generated, but for simplicity, we assume each MAC address corresponds to a device. These two figures show a significant difference between what the two sensors detect. To better visualize this difference, we extracted the set of detections made at only one of the sensors and represent them in Figure 2c. In ideal circumstances, Figure 2c would be empty. What the graphs do not show is a large number of detections for devices (probably just MAC addresses) observed by only one of the sensors. These detections represent a significant part of the data, and adding them would have cluttered the graphs.
Because the sensors are placed close to each other, we expect them to receive almost the same frames. Frames detected by one sensor, but not the other exist because of interference and environment noise, which cause the signals to be malformed, resulting in frames that cannot be decoded and recorded.
The percentage of lost frames is different from device to device, based on position and environment. Eight devices were inside the same office with the sensors. These were laptops, tablets, and smartphones. We name them “our devices” (OD). Devices detected by both sensors are called “common devices” (CD). Finally, we have the group of “all detections”. We do not know anything about devices outside the initial group of eight inside the office.
In Figure 2d, we show with red what percentage of detections are recorded at only one of the sensors. Each bar represents one sensor and a specific subset of detections. There are two groupings for detections. One contains “our devices”, “common devices”, and “all devices”. The other grouping contains all frames, as opposed to only probe request frames.
Devices outside the office, which have the highest chance of being affected by environmental conditions, are the ones that have the lowest number of detections recorded at both sensors. The percentage of unique detections reaches 70%, meaning most frames are not received by both sensors. The percentage of detections at both sensors is improved when we consider only probe request frames. It is possible for the communication stream to trigger many detections, all detected at only one sensor. Probe request frames are not part of normal communication, as they are control frames.
The conclusion that most frames are not detected is supported by the 50% of frames we observed during our experiment with the retransmission flag set to true. In the outdoor festival environment, frame loss can be even higher. When conducting WiFi tracking, the high frame loss causes the sparsity in traces. Furthermore, sensors that have overlapping detection areas are placed at more than 70 m apart, making it unlikely that frames sent by devices in the overlapping area are detected by both sensors. This creates the appearance of a back and forth movement like the one from the trace in Figure 1b.
Determining Periods of Stops and Movements
A trace of an individual extracted from a tracking dataset consists of a set of timestamped locations. When these points have a regular, high frequency and high positional accuracy, placing them on a map reveals the paths taken by the individual, the places and the time at which they were visited. However, when the data are sparse and the positioning accuracy is low, as is the case for WiFi traces, making any sense of the data in this raw form is difficult.
An important step in simplifying traces is to partition the data into stop and move periods. We have selected three algorithms that can partition traces generated using GPS data.
We chose these three algorithms because they use different attributes of the movement data: distance, speed, and direction. By analyzing them on the datasets we obtained using WiFi tracking, we aim not only to identify which solution fits best, but also to understand which attribute is more useful in understanding this type of dataset. The algorithms are:
- Cbsmot , a modification of the popular clustering algorithm Dbscan. Cbsmot uses the time and distance between consecutive detections. This means the algorithm can detect areas when the speed is low.
- Dbsmot  is inspired by Cbsmot, but instead of taking speed into account, it takes the direction of movement. The idea is that when the device is moving, detections appear in a somewhat straight line, but when it is standing still and detections are formed around it, drawing lines from one detection to the next appears as a device that is greatly changing direction.
- Stay Point detection [3,4] is based on the idea that points generated when a device is static are bounded to an area based on a distance given by the accuracy of the measuring technique. When a device has a new position further than this set limit, it must have moved. We take a pivot in the first point in the dataset and update the pivot when a new, far enough location is found.
More information: “Et Tu Alexa? When Commodity WiFi Devices Turn into Adversarial Motion Sensors,” Zhu et al., accepted for the Network and Distributed Systems Security (NDSS) symposium in February 2020. arxiv.org/abs/1810.10109