Recent studies show that it is possible to exploit the non-linearity in microphone to deliver inaudible commands to the system via ultrasound signals.
Ultrasonic waves don’t make a sound, but they can still activate Siri on your cellphone and have it make calls, take images or read the contents of a text to a stranger.
All without the phone owner’s knowledge.
Attacks on cell phones aren’t new, and researchers have previously shown that ultrasonic waves can be used to deliver a single command through the air.
DolphinAttack  by Zhang et al. was among the first to demonstrate inaudible attacks towards voice-enabled devices by injecting ultrasound signals over the air, which can launch from a distance of 5ft to the device.
Recognizing the limitation in the range of the attack, LipRead  extends the attack range to 25ft by aggregating ultrasound signals from an array of speakers, which requires line-of-sight.
While these two attacks demonstrate the feasibility of voice command injection via inaudible ultrasound, they focus solely on over the air transmission, which leads to several inherent limitations due to the physical property of ultrasound wave propagation in air, such as significant performance degradation when there is lineof-sight obstruction.
However, sound wave is fundamentally the transfer of acoustic energy through a medium. It can propagate wherever vibration is possible, such as water and solid materials, in which the propagation characteristics are different from air.
However, new research from Washington University in St. Louis expands the scope of vulnerability that ultrasonic waves pose to cellphone security.
These waves, the researchers found, can propagate through many solid surfaces to activate voice recognition systems and – with the addition of some cheap hardware – the person initiating the attack can also hear the phone’s response.
The results were presented Feb. 24 at the Network and Distributed System Security Symposium in San Diego.
“We want to raise awareness of such a threat,” said Ning Zhang, assistant professor of computer science and engineering at the McKelvey School of Engineering. “I want everybody in the public to know this.”
Zhang and his co-authors were able to send “voice” commands to cellphones as they sat inconspicuously on a table, next to the owner. With the addition of a stealthily placed microphone, the researchers were able to communicate back and forth with the phone, ultimately controlling it from afar.
Ultrasonic waves are sound waves in a frequency that is higher than humans can hear.
Cellphone microphones, however, can and do record these higher frequencies.
“If you know how to play with the signals, you can manipulate them such that when the phone interprets the incoming sound waves, it will think that you are saying a command,” Zhang said.
To test the ability of ultrasonic waves to transmit these “commands” through solid surfaces, the research team set up a host of experiments that included a phone on a table.
Attached to the bottom of the table was a microphone and a piezoelectric transducer (PZT), which is used to convert electricity into ultrasonic waves.
On the other side of the table from the phone, ostensibly hidden from the phone’s user, is a waveform generator to generate the correct signals.
The team ran two tests, one to retrieve an SMS (text) passcode and another to make a fraudulent call.
The first test relied on the common virtual assistant command “read my messages” and on the use of two-factor authentication, in which a passcode is sent to a user’s phone – from a bank, for instance – to verify the user’s identity.
The attacker first told the virtual assistant to turn the volume down to Level 3. At this volume, the victim did not notice their phone’s responses in an office setting with a moderate noise level.
Then, when a simulated message from a bank arrived, the attack device sent the “read my messages” command to the phone. The response was audible to the microphone under the table, but not to the victim.
In the second test, the attack device sent the message “call Sam with speakerphone,” initiating a call. Using the microphone under the table, the attacker was able to carry on a conversation with “Sam.”
The team tested 17 different phone models, including popular iPhones, Galaxy and Moto models. All but two were vulnerable to ultrasonic wave attacks.
Ultrasonic waves made it through metal, glass and wood
They also tested different table surfaces and phone configurations.
“We did it on metal. We did it on glass. We did it on wood,” Zhang said. They tried placing the phone in different positions, changing the orientation of the microphone.
They placed objects on the table in an attempt to dampen the strength of the waves. “It still worked,” he said. Even at distances as far as 30 feet.
Attacks on cell phones aren’t new, and researchers have previously shown that ultrasonic waves can be used to deliver a single command through the air.
Ultrasonic wave attacks also worked on plastic tables, but not as reliably.
Phone cases only slightly affected the attack success rates. Placing water on the table, potentially to absorb the waves, had no effect. Moreover, an attack wave could simultaneously affect more than one phone.
The research team also included researchers from Michigan State University, the University of Nebraska-Lincoln and the Chinese Academy of Sciences.
Credit: Surfing Attacks/WUSTL.
To demonstrate the practicality of SurfingAttack, researchers build a prototype of the attack device using a commercial-off-the-shelf PZT transducer, which costs around $5 per piece. Using our prototype device, they conduct the following two attacks as a demonstration:
- Hacking an SMS passcode. SMS-based two-factor authentication has been widely adopted by almost all major services , which often delivers one-time passwords over SMS. A SurfingAttack adversary can activate the victim’s device to read SMS messages in secret thereby extracting SMS passcodes.
- Making fraudulent calls. A SurfingAttack adversary can also take control of the owner’s phone to call arbitrary numbers and conduct an interactive dialogue for phone fraud using the synthetic voice of the victim.
Researchers have tested SurfingAttack on 17 popular smartphones and 4 representative types of tables.
They successfully launch SurfingAttack on 15 smartphones and 3 types of tables. A web- site is set up (https://surfingattack.github.io/) to demonstrate the attacks towards different phones under different scenarios, and various new attacks such as selfie taking, SMS passcode hacking, and fraudulent phone call attacks.
With the growing popularity of mobile voice commerce and voice payments , they believe the demonstrated interactive hidden attack opens up new attacker capabilities that the community should be aware of. In summary, their contributions are as follows,
- Researchers present, SurfingAttack, the first exploration of attack leveraging unique characteristics of ultrasound propagation in solid medium and non-linearity of the microphone circuits to inject inaudible command on voice assistants. They validate the effectivene. They also show the attack is resilient against verbal conversations.
- Researchers evaluate SurfingAttack on 4 representative types of table materials. They find that SurfingAttack is most effective through 3 types of tables: aluminum/steel, glass, and medium-density fiberboard (MDF). No- tably, SurfingAttack can achieve long-ran. They also validate the effective- ness of SurfingAttack on aluminum and glass tables with different thicknesses (up to 1.5 inch aluminum and 3/8 inch glass).
- Researchers further explore the possibility to pair command injection with a hidden microphone to enable hidden conversations between the attacker and the victim voice assistant. They demonstrate several practical at- tacks using the prototype they build, including hacking an SMS passcode and making a ghost fraud phone call without owners’ knowledge.
- Researchers provide discussions on several potential defense mechanisms, including using the high-frequency com- ponents of guided waves as an indication of intrusion.
Zhang said the success of the “surfing attack,” as it’s called in the paper, highlights the less-often discussed link between the cyber and the physical.
Often, media outlets report on ways in which our devices are affecting the world we live in: Are our cellphones ruining our eyesight? o headphones or earbuds damage our ears?
Who is to blame if a self-driving car causes an accident?
“I feel like not enough attention is being given to the physics of our computing systems,” he said. “This is going to be one of the keys in understanding attacks that propagate between these two worlds.”
The team suggested some defense mechanisms that could protect against such an attack. One idea would be the development of phone software that analyzes the received signal to discriminate between ultrasonic waves and genuine human voices, Zhang said.
Changing the layout of mobile phones, such as the placement of the microphone, to dampen or suppress ultrasound waves could also stop a surfing attack.
But Zhang said there’s a simple way to keep a phone out of harm’s way of ultrasonic waves: the interlayer-based defense, which uses a soft, woven fabric to increase the “impedance mismatch.”
In other words, put the phone on a tablecloth.
Funding: This work is supported in part by the National Science Foundation, grants CNS-1950171, CNS-1949753 and CNS-1837519.
- KEY ELEMENTS OF SurfingAttack
There are three necessary conditions for the success of SurfingAttack: (1) The ultrasonic wave in the table must be able to reach the device microphone embodied in the device enclosure. (2) Even when the microphone may not be in direct contact with the transmission medium, the wave should still be able to leverage the non-linearity of the device microphone on the tabletop to launch the inaudible command injection attacks.
- The response from the victim device can be received by the attacker via the planted device without raising suspicion of the victim user. More specifically, the volume of victim’s device can be tuned down such that user cannot notice it, yet the response can be recorded by a tapping device beneath the table.
- Attack Wave Mode Selection and Generation
The first condition for the attack is the capability to deliver inaudible ultrasound waves to the target device effectively. Different from waves in air, the acoustic waves propagating in solid materials have acoustic dispersion phenomenon, during which a sound wave separates into its component frequencies as it passes through the material. Lower dispersion indicates a better concentration of acoustic energy. This implies that a proper Lamb wave mode for SurfingAttack should feature (1) low dispersion, (2) low attenuation, (3) easy excitability , and (4) high attack signal reachability. To achieve the afore- mentioned features, there are three key design decisions: the signal waveform, Lamb wave mode, and the ultrasound signal source.
First, guided wave signals can be generated via either windowed modulation or pulse signals. It has been shown that narrowband input signals are most effective in restricting wave dispersal in large and thick plates . As a result, narrowband windowed modulation signals is used to carry the attack command in SurfingAttack to minimize dispersion.
Second, different Lamb wave modes have different field distributions throughout the whole plate , depending on the different frequency-thickness and materials parameters as shown in Fig. 3. Since the attack frequency range from 20 kHz to 40 kHz has the best performance in stimulating the microphone’s non-linearity effect, we are limited to the lower- order Lamb wave modes, i.e., A0 or S0 mode. In order to succeed in the attacks, the Lamb wave should be able to spread from a point of the table to the victim’s device on the tabletop effectively. As a result, the generated Lamb waves need to produce a high out-of-plane displacement2 on the table surface. As most of the displacement of the A0 mode is out-of-plane, while most of the displacement of the S0 mode is in-plane3 with lower frequency-thickness products. A0 wave mode below the cut-off frequencies of the higher order Lamb wave modes is selected to create the ultrasound commands.
Lastly, we choose to use a circular piezoelectric disc to generate the signal for its energy efficiency and omni- directivity. It applies a vertical force towards the table surface, resulting in a flexural wave propagating radially outwards and thus enabling an omni-directional attack through the table.
The energy efficiency is important since the piezoelectric disc that is hidden under the solid materials needs to produce strong waves to reach extended distances with a minimal amount of energy. The omni-directivity is crucial because the attack should work regardless where the target’s location and orientation are on the medium, i.e., wherever your phone is placed on the table.
The omni-directivity of the attack is evaluated in Section VI-C. Furthermore, since objects on the table surface could change frequently, we need to make sure that the signal propagation still works regardless of whether there are objects on the table. The corresponding evaluation is presented in Section VI-G.
- Triggering Non-linearity Effect via Solid Medium
While the non-linearity has been demonstrated for ultra- sound wave that is directly delivered to the speaker via air, it is unclear if it is feasible to trigger the same effect when acoustic waves pass through the table materials to reach the external enclosure of the phone. We conduct extensive experiments to verify if the non-linearity effect of the voice capture hardware of a smartphone placed on the tabletop can be triggered by the ultrasonic guided waves that propagate in the table. The setup for one of the initial experiments is shown in Fig. 4.
We use a low-cost radial mode vibration PZT disc  (which only costs $5 per piece) with 22 mm diameter and 0.25 mm thickness to generate the ultrasonic guided wave. The disc is adhered to the underside of an aluminum plate with 3 mm thickness.
The size of the PZT transducer is much smaller than the ultrasound speaker used in existing attacks , , , making the attacks more stealthy and economically accessible, as shown in Fig. 4(b). We use a chirp signal from 50 Hz to
5 kHz as the baseband signal. The baseband signal is then imported to a Keysight 33500B series waveform generator and modulated onto a carrier. The 9V output is then supplied to the PZT transducer to excite Lamb waves. By analyzing the recorded signal of a smartphone (i.e., Google Pixel), the non- linearity of microphones could be evaluated.
Fig. 5 shows the spectrogram of the baseband signal and the recorded signal when carrier frequency fc = 25.3 kHz. The ultrasonic guided wave propagates to the device microphone and any resulted sound is recorded.
The results confirm the existence of the nonlinear response of the voice capture hardware incited by ultrasonic guided waves. Fig. 5(b) shows the recorded sound signal in the time-frequency domain, in which the first harmonic component is almost identical to the original signal displayed in Fig. 5(a). This result demonstrates the feasibility of attacking voice controllable systems placed on the tabletop through ultrasonic guided waves.
- Unnoticeable Response
Unnoticeable response of a target phone is critical for keeping the attack under the radar. Sound pressure level (SPL) is used to quantify the sound pressure of a sound relative to a reference pressure at the eardrums of our hearing or on the diaphragms of the microphones. SPL is determined by the corresponding audio voltage, while standard reference sound pressure p0 = 20 µ Pa = 0 dB is the quietest sound a human can perceive .
SPL depends on the distance between the area of mea- surement and point-shaped sound sources in the free field. We assume r1 as the distance between the tapping device and the sound source, r2 as the distance between the user and the sound source. L1 and L2 are SPLs at the tapping device and the user end, the relationship of which follows the inverse distance law as written below :
Approximately, an SPL drop of 6 dB is expected by doubling the difference of r1 and r2. When SPL at the user end drops below 0 dB, the voice response becomes essentially inaudible to the user.
Thus, it becomes feasible to conceal SurfingAttack by adjusting the volume of the device via ultrasonic guide wave and placing a hidden tapping device closer to the victim’s device underneath the table. Note that the inverse distance law is always an idealization because it assumes exactly equal sound pressure as sound field propagates in all directions.
If there are reflective surfaces in the sound field, the reflected sounds will be added to the direct sound, resulting in a higher SPL at a field location than the inverse distance law predicts. If there are barriers between the source and the point of measurement, we may get a lower SPL.
To validate the feasibility, we evaluate the SPL of a Google Pixel phone at different volumes, the results of which are shown in Fig. 6. Here, we let the phone produce 1 kHz sinusoidal tones with low volume levels, and an A-weighting SPL meter is used to measure SPL at various distances.
The experiment is conducted in a quiet office (about 400 square feet) with an average background noise of 40.5 dB. Although the SPL stays above 0 dB, it decreases with distance, and the signal is quickly overwhelmed by environmental noise after propagating 50 100 cm at volume level 1 3.
We also deployed a microphone as a tapping device underneath the table, which is proven capable of recording the weak voice responses. The results show that it is feasible for the attacker to adjust the volume low enough to make the voice responses unnoticeable by the user from a moderate distance, while a hidden tapping device can still capture the sound.
To enhance the sound capturing capability, we can deploy multiple tapping devices at different positions under the table to precisely capture the weak voice responses from the device speaker as well. In an environment with larger background noise, we can adjust volume even higher without alerting the owner.
Lastly, the attacker can turn off the screen to further enhance the stealthiness of the threat. In Section V-D, we run extensive experiments to corroborate the stealthiness of SurfingAttack by measuring the responses of victim phones in different environments.
TABLE I: Experiment devices, systems, and results. The tested attacks include recording activation (record the ultrasonic commands, and then replay it to the voice assistant), direct activation (activate the voice assistant), direct recognition (execute voice commands). fc: attack signal frequency; m: modulation depth; r: cosine fraction of Tukey window; Mean Amplitude: the average amplitude of the demodulated chirps at fc.