A group of researchers has unveiled a novel data leakage attack named GhostRace (CVE-2024-2193), impacting contemporary CPU architectures that utilize speculative execution. This recent discovery is a derivative of the transient execution CPU vulnerability recognized as Spectre v1 (CVE-2017-5753), intertwining speculative execution with race conditions to expose new security weaknesses.
The researchers, from the Systems Security Research Group at IBM Research Europe and VUSec, articulate that all common synchronization primitives constructed using conditional branches can be circumvented microarchitecturally on speculative paths through a branch misprediction attack. This maneuver converts all architecturally race-free critical regions into Speculative Race Conditions (SRCs), facilitating attackers in leaking information from the affected systems.
The inception of speculative execution as a performance optimization has been widely adopted by modern CPUs, aiming to enhance processing speed. However, this technique has become a double-edged sword due to Spectre attacks, which exploit branch prediction and speculative execution to access privileged memory data, effectively eroding the isolation safeguards between applications.
Identified in early 2018, Spectre attacks manipulate a victim into speculatively executing operations that divulge confidential information through a covert channel to an adversary. These operations wouldn’t ordinarily occur in a strictly serialized, in-order processing environment. GhostRace distinguishes itself by allowing an unauthenticated attacker to extract arbitrary data from the processor, utilizing race conditions to engage speculative executable code paths via a Speculative Concurrent Use-After-Free (SCUAF) attack.
Race conditions manifest when multiple processes access a shared resource without adequate synchronization, leading to unpredictable outcomes. This creates an exploitable situation for attackers to execute malicious operations. The CERT Coordination Center (CERT/CC) clarifies that while SRC vulnerabilities share similarities with traditional race conditions, they differ because the exploitation occurs on a transiently executed path stemming from a mis-speculated branch, similar to Spectre v1.
GhostRace’s impact is broad, with potential vulnerabilities in any software, such as operating systems or hypervisors, that implement synchronization primitives through conditional branches on speculative execution-capable microarchitectures (including x86, ARM, and RISC-V).
In response to the discovery of GhostRace, AMD reaffirmed that their existing guidelines for countering Spectre vulnerabilities remain effective in mitigating this new threat. The Xen open-source hypervisor maintainers acknowledged the susceptibility of all versions to GhostRace, albeit downplaying the immediate security risk. Nonetheless, Xen has introduced hardening patches, including a new LOCK_HARDEN mechanism for x86, akin to the previously established BRANCH_HARDEN, to preemptively address potential vulnerabilities, although it’s turned off by default due to performance impact uncertainties and the ongoing evaluation of the threat’s seriousness.
This proactive stance by the tech community highlights the ongoing challenges and necessitates continuous research in the realm of CPU security, emphasizing the critical need for advanced mitigation strategies to protect against such sophisticated attacks.
Exploiting Speculative Race Conditions in Kernel Data: A Detailed Analysis of GhostRace Attacks
The concept of a GhostRace attack emerges as a sophisticated method aimed at revealing arbitrary kernel data. This is achieved by exploiting a speculative race condition in a region that is architecturally devoid of race conditions. To comprehend the intricacies of this attack, one can look into the example found within the Linux kernel version 5.15.83, as identified by a gadget scanner at net/nfc/hci/core.c:78
.
This particular code snippet becomes the focal point of our discussion, showcasing two threads simultaneously executing a gadget situated at the heart of the nfc_hci_msg_tx_work
function. This function is integral to the Host Controller Interface (HCI) layer within the Near Field Communication (NFC) driver core of the Linux kernel, tasked with processing pending messages directed to the NFC device. Due to the absence of the requisite NFC hardware for native function execution, an additional system call was integrated to facilitate access to this code path during analysis.
The critical region, or the gadget in question, operates on the nfc_hci_dev hdev
device and performs a series of operations. Initially, it acquires the msg_tx_mutex
mutex, securing exclusive access to the device and entering the critical region. It then verifies if the pending hci_msg hdev->cmd_pending_msg
command message has an associated callback. If affirmative, the callback is executed through the hdev->cmd_pending_msg->cb
function pointer. Subsequently, the command message’s memory is released, and the hdev->cmd_pending_msg
pointer is set to NULL, followed by the exit from the critical region through mutex release.
The mutex plays a pivotal role in ensuring that the critical region is safeguarded from concurrent access by different user processes or threads that share the NFC device, thereby enforcing mutual exclusivity and negating race conditions. Without the mutex, the code would be susceptible to a concurrent Use-After-Free (UAF) vulnerability. For instance, as one thread executes the Free code, delineated in bold green, there exists a temporal gap between the freeing of the hdev->cmd_pending_msg
pointer and its subsequent nullification. During this interval, another thread might execute the Use code, highlighted in bold red, and invoke the hdev->cmd_pending_msg->cb
callback for the message that was just freed. This vulnerability could be exploited by an attacker who can manipulate memory reuse to initiate the Use with a controlled callback, leading to control-flow hijacking.
Modern operating systems like the Linux kernel provide various synchronization mechanisms, including mutexes, spin locks, and RW locks, which make architectural exploitation of such race-free code unfeasible. However, the landscape changes when we enter the speculative domain, where even architecturally race-free execution can become vulnerable to Speculative Race Conditions (SRCs).
The GhostRace attack exploits the NFC gadget to create a Speculative Use-After-Free (SCUAF) primitive, leading to data disclosure. This involves one thread executing its critical region architecturally (architectural Free) while another thread is speculatively executed (speculative Use). To achieve a successful exploitation, it is imperative to interrupt the first thread right after the Free operation, creating a sufficiently large race window for practical exploitation. The SCUAF primitive must be executed multiple times to facilitate comprehensive speculative information disclosure attacks.
In summary, the GhostRace attacks necessitate overcoming several challenges, as depicted in the provided diagram. These challenges include ensuring a large enough race window for exploitation and managing speculative execution to achieve the desired outcome without being thwarted by the architectural race-free guarantees provided by modern synchronization primitives.
Image: The NFC gadget (net/nfc/hci/core.c:78) found by our scanner and the three main challenges to mount an end-to-end GhostRace attack.
- (C1) Create a large, ideally unbounded, architectural UAF exploitation window between kfree and the NULL hdev->cmd_pending_msg pointer update to accommo- date as many SCUAF primitive invocations as possible.
- (C2) Turn our architecturally race-free gadget into a speculative race condition, crafting a SCUAF prim- itive speculatively dereferencing the (dangling) hdev->cmd_pending_msg->cb function pointer.
- (C3) Use the building blocks above to mount end-to-end in- formation disclosure attacks against the kernel.
Creating an Unbounded UAF Window
To effectively tackle Challenge 1 (C1), a strategy is required to pause an arbitrary thread within the Linux kernel for an extended, ideally limitless, duration. This approach would facilitate the creation of an architectural unbounded Use-After-Free (UAF) window, allowing the attacking Thread 2 to exploit the scenario within the victim Thread 1. The complexity of this case study is underscored by the inherently brief nature of the “original” UAF exploitation window. The attacker must utilize the hdev->cmd_pending_msg
pointer in Thread 2 post the memory object’s release in Thread 1 (at line 9) and before the pointer’s nullification (line 10).
Figure : From eight instructions-wide to unbounded architectural Use-After-Free exploitation window. Steps 1 and 6 run in user mode, issuing syscalls to trigger the relevant kernel code. The other steps run in kernel mode.
The narrow window of opportunity is best understood through an examination of the kfree
function. As depicted, the default slab allocator (slab_free
) deletes the object (list_del
) and subsequently unlocks an interrupt-safe spinlock (_raw_spin_unlock_irqrestore
), before returning control to the caller. Given the spinlock’s operation with interrupts disabled, the original UAF window is confined to a mere eight instructions, covering the interval from spinlock release to the null pointer update.
Figure : Size of the UAF exploitation window vs. number of IPI storming cores targeting the victim core. The size of the exploitation window is measured in number of getpid syscalls (a standard benchmark to evaluate generic round trips to the kernel [11]) that attackers can run before the victim core handles all incoming membarrier IPIs and updates the dangling hdev->cmd_pending_msg pointer to NULL. The experiment is performed on a commodity client Intel 12th-generation i9-12900K CPU, which has 16 cores and 24 Simultaneous Multithreads (SMTs). We observe that only 15 SMTs are sufficient to obtain an unbounded UAF exploitation window. We also observe that the location of the IPI storming cores matters [23, 57], as the physically closer a storming core is to the victim core, the higher the IPI throughput due to the lower latency on the interconnect. This explains the big increase from 10 to 12 and from 12 to 15 SMTs as the storming cores added in both experiments were physically the closest to the victim core among all available cores.
To extend this fleeting window, existing interrupt-driven methodologies can be leveraged, albeit with inherent limitations as they were not originally designed to facilitate multiple UAF instances reliably. Additionally, techniques that depend on a high-priority user thread to preempt kernel execution are not viable on standard kernels, where preemption is disabled by default (with CONFIG_PREEMPT
unset).
Figure : Top part: The core implementation of the mutex_lock synchronization primitive, with the conditional branch which can be abused to craft SRCs in red. Bottom part: The branch ultimately checks the outcome of the lock cmpxchgq instruction which does not serialize the execution.
Transitioning from a limited to an unbounded exploitation window necessitates a novel strategy that synergizes various techniques. Initially, inspired by the timerfd-based method, high-precision hardware timers are employed to timely interrupt the victim thread, thereby modestly enlarging the UAF window. In this context, the timerfd-based method’s efficacy is enhanced, enabling more precise interruption timing coinciding with the interrupt-disabled phase of kfree
. Further, the strategy involves inducing an Inter-Processor Interrupt (IPI) storm, albeit less precisely, to extend the exploitation window indefinitely. This is achievable as the victim CPU remains preoccupied with handling IPIs at the attacker’s discretion.
The attack commences with the attacker setting a high-precision hardware timer on a victim core, calibrated to trigger at a future nanosecond-precise moment. Subsequently, a victim thread initiates a system call to access the target gadget, leading to the kfree
call. Here, kfree
acquires the interrupt-safe spinlock and disposes of the victim memory object in an uninterruptible manner. Upon spinlock release, interruptible execution resumes, and if the timer expired during the uninterruptible phase, the victim thread is promptly interrupted at the onset of the interruptible UAF window.
Despite potential inaccuracies in timer calibration, the design ensures a high likelihood of interrupting the victim precisely when needed. This is because the interrupt is delayed until post-spinlock release, allowing execution interruption within the uninterruptible phase of kfree
, beyond the original UAF window.
Kernel execution then transitions to the timer interrupt handler, which is usually short-lived. However, registering multiple timer observers can further amplify the exploitation window. While this can disrupt the victim at an opportune moment and enhance the original UAF window, it remains insufficient for numerous SCUAF primitive executions.
The key is to use the amplified window to disrupt the timer interrupt handler with a jittery IPI from another core. This leads to the attacker coordinating IPI-storming threads on remaining cores, continually directing IPIs to the victim core. The membarrier
system call, specifically the MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ IPI
, is optimally suited for this task due to its low-latency, single-core targeting capability.
This orchestrated IPI storm not only interrupts the timer interrupt handler but completely saturates the victim core. The relationship between the UAF exploitation window size and the number of storming cores is evident, with 15 SMTs sufficing to saturate the victim on tested platforms. The proximity of storming cores to the victim core significantly influences the IPI throughput, impacting the exploitation window’s size.
Ultimately, the victim core is trapped in a cycle of handling an endless stream of MEMBARRIER
IPIs, thus crafting an architectural unbounded UAF exploitation window. This allows for a prolonged, end-to-end attack duration. Post-attack, storming threads are ceased, the victim thread’s execution resumes, and the hdev->cmd_pending_msg
pointer is finally nullified. Crucially, between these phases, the attacker has the liberty to conduct speculative execution attacks repeatedly, leveraging the expansive window created. Therefore, successfully establishing this unbounded window even once provides the attacker with the potential to extract extensive data.
reference link :
- GhostRace Paper (PDF)
- https://www.vusec.net/projects/ghostrace/