Linux 7.0: How a Kernel Preemption Bug Crippled PostgreSQL Performance in 2026

In April 2026, the Linux Kernel 7.0 release promised evolutionary advancements, but for PostgreSQL users, it delivered a brutal, silent performance regression, abruptly halving throughput on critical production workloads without a single error message.

The Silent Killer: How Linux 7.0 Blindfolded PostgreSQL

The eagerly awaited release of Linux Kernel 7.0 in early 2026 was met with the usual anticipation within the open-source community. Touted for its efficiency improvements and new hardware support, it was expected to be a solid, if not revolutionary, upgrade. Yet, for database administrators and cloud engineers managing high-performance PostgreSQL instances, it brought an unforeseen and devastating impact.

The first alarm bells rang loudly from within AWS. Salvatore Dipietro, a distinguished engineer at Amazon Web Services, was among the first to critically quantify the damage. His findings were stark: a massive ~50% degradation in PostgreSQL throughput on specific, highly concurrent workloads after upgrading to Linux 7.0. This wasn’t a minor hiccup; it was a catastrophic performance collapse.

Why was this particularly pronounced on Graviton4 machines? AWS’s Graviton processors, based on the Arm64 architecture, are renowned for their cost-effectiveness and performance for scale-out workloads. The very optimizations that made Graviton4 ideal for databases inadvertently positioned it as the canary in the coal mine for this system-wide issue. These machines, with their high vCPU counts and specific architecture, exposed the underlying kernel change with brutal clarity.

The insidious nature of the problem made it a nightmare to diagnose. There were no kernel panics, no application crashes, and no explicit error messages in any logs. PostgreSQL continued to run, applications connected, and queries executed, but everything was simply slower. This pervasive, inexplicable slowdown baffled even seasoned site reliability engineers and database experts, who initially suspected database-level issues.

Dissecting the Beast: Linux 7.0’s Preemption Model Overhaul

To understand this regression, we must first revisit the fundamental concept of CPU preemption. Preemption is the act of a scheduler interrupting an executing task to allow another task to run. It’s crucial for achieving fair scheduling, system responsiveness, and ensuring that no single task monopolizes the CPU.

Linux offers different preemption models, each designed to balance throughput, latency, and responsiveness:

  • PREEMPT_NONE: The least aggressive model, where user-space tasks are only preempted at explicit reschedule points or when blocking for resources. Kernel-space code is non-preemptible. This maximizes throughput by minimizing context switches and cache invalidations.
  • PREEMPT_VOLUNTARY: Allows for preemption of kernel-space tasks at “safe” points, offering a middle ground between responsiveness and throughput.
  • PREEMPT_FULL: The most aggressive model, where tasks can be preempted almost anywhere, ensuring maximum responsiveness, often at the cost of higher context switching overhead.

Historically, for database engines like PostgreSQL, PREEMPT_NONE was the gold standard. These highly CPU-bound applications rely on minimizing context switches to maximize CPU cache efficiency. When a database process acquires a lock and performs a critical, short operation, it ideally wants to complete that work without interruption. PREEMPT_NONE provided this “run to completion” behavior, allowing PostgreSQL’s internal scheduling to shine without kernel interference.

The controversial decision in Kernel 7.0, specifically introduced by Peter Zijlstra’s commit 7dadeaa6e851 titled “sched: Further restrict the preemption modes,” was a paradigm shift. This change restricted or, in effect, removed PREEMPT_NONE as a configurable option for modern architectures, including Arm64 (Graviton), x86_64, PowerPC, RISC-V, s390, and LoongArch. All systems were effectively pushed towards more aggressive preemption models, primarily PREEMPT_LAZY and PREEMPT_FULL.

PostgreSQL’s internal scheduling model is highly optimized, almost “self-managing”. Its worker processes, particularly in buffer management, extensively use user-space spinlocks for short-term protection of shared data structures. These spinlocks were designed under the implicit assumption that a thread holding a lock would run unmolested for a brief period, then release it quickly. This laissez-faire approach from the kernel, characteristic of PREEMPT_NONE, was fundamental to PostgreSQL’s high-concurrency performance.

This is where the direct collision occurred. The new, more aggressive kernel preemption, especially under PREEMPT_LAZY, means the scheduler can now interrupt a thread even when it holds a spinlock. When a lock holder is preempted, other waiting threads are forced into busy-waiting, spinning longer for the lock to be released. This led to a dramatic increase in context switching overhead and exacerbated spinlock contention within critical PostgreSQL functions like StartReadBuffer, GetVictimBuffer, and StrategyGetBuffer. The kernel was now actively hindering PostgreSQL’s finely tuned internal concurrency mechanisms.

Unmasking the Culprit: Tracing the Preemption Impact

Identifying the root cause of such a silent regression requires deep system-level diagnostics. The first step was to confirm the active kernel preemption model. This can typically be verified by inspecting the kernel configuration used to build the running system.

You can check the CONFIG_PREEMPTION settings from your running kernel’s configuration file:

# Check the preemption model configured for your current kernel
# For a typical distribution, this file is located in /boot
grep CONFIG_PREEMPTION /boot/config-$(uname -r)

# Expected output on a system running Linux 7.0 affected by the change:
# CONFIG_PREEMPTION=y
# CONFIG_PREEMPT_LAZY=y
# CONFIG_PREEMPT_RCU=y
# CONFIG_PREEMPT_NONE=n  # This 'n' is the smoking gun for older database workloads!
# CONFIG_PREEMPT_VOLUNTARY=n

The absence or explicit disabling of PREEMPT_NONE (CONFIG_PREEMPT_NONE=n) on a Linux 7.0 kernel confirmed the architectural shift. This meant that the kernel was indeed forcing a more preemptible model, even if PREEMPT_LAZY was intended to be “gentle.”

Once the preemption model was confirmed, the next step was to leverage powerful Linux profiling tools like perf for deep-dive diagnostics. perf allowed engineers to precisely analyze where CPU time was being spent, identify excessive context switching, and pinpoint cache-related inefficiencies.

Salvatore Dipietro’s initial perf analysis was unambiguous. It revealed that an astounding 55% of CPU time on the affected Graviton4 PostgreSQL instances was being spent within PostgreSQL’s internal spinlock functions and their callers.

# Example perf top output revealing high s_lock contention
# This output highlights the CPU percentage spent in various functions
# under a high-contention PostgreSQL workload on Linux 7.0.
# The 's_lock' function indicates contention for PostgreSQL's internal spinlocks.

56.03% - StartReadBuffer   # PostgreSQL function to initiate reading a buffer
55.93% - GetVictimBuffer   # Part of buffer management, finding a buffer to evict
55.93% - StrategyGetBuffer # Main strategy for acquiring a buffer
55.60% - s_lock            # PostgreSQL's spinlock primitive - this is the core bottleneck
...

This output directly implicated the kernel’s preemption changes. When s_lock shows such high utilization, it means processes are spending most of their time either trying to acquire a lock or busy-waiting for a lock held by another process that has been unexpectedly preempted. This dramatically increases context switch rates (perf stat -e 'sched:sched_switch') and contributes to higher CPU cache misses as processes are frequently swapped in and out, invalidating their cache lines.

To illustrate the stark performance delta, pgbench, PostgreSQL’s standard benchmarking tool, was invaluable. Dipietro’s tests provided concrete numbers, demonstrating the direct impact of the kernel change on transaction throughput:

Linux 6.x (with PREEMPT_NONE enabled): 98,565 transactions per second Linux 7.0 (with PREEMPT_LAZY/FULL): 50,751 transactions per second

This almost perfect halving of transactions per second (TPS) under an identical, high-parallelism workload (96 threads, 1024 clients, scale factor 8,470) on Graviton4 hardware was the irrefutable evidence. The pgbench command used for such a test might look like this:

# Example pgbench command for a high-concurrency, update-heavy workload
# This simulates 1024 clients across 96 parallel threads performing simple updates
# against a large database (scale factor 8470, ~847 million rows).
pgbench -i -s 8470 # Initialize a large database for the benchmark
pgbench -c 1024 -j 96 -T 300 -P 5 -M simple_update -h localhost -p 5432 -U pgbench_user pgbench_db
# -c 1024: 1024 total clients connecting to the database
# -j 96: 96 concurrent threads executing transactions
# -T 300: Run the benchmark for 300 seconds (5 minutes)
# -P 5: Report progress every 5 seconds
# -M simple_update: Use the predefined 'simple update' transaction type (high contention)
# Other parameters: -h (host), -p (port), -U (user), last argument is the database name

Beyond perf and pgbench, BPF (eBPF) tools offered even finer-grained insights. Tools like execsnoop or custom BPF programs could directly observe syscall activity and scheduler events in real-time. This allowed engineers to see the unexpected frequency of process migrations, voluntary and involuntary context switches, and preemption events, directly correlating them with PostgreSQL’s spinlock contention. These insights helped confirm that the kernel scheduler was indeed preempting tasks holding critical resources, leading to cascading delays.

Even seemingly simple tools like top and htop offered subtle clues that, in hindsight, pointed directly to scheduler contention. A high CPU utilization that doesn’t translate into proportional application throughput, combined with elevated “context switches” counters, especially for highly concurrent workloads, became a tell-tale sign. The wa (wait I/O) metric might appear low, but the sy (system) time could be elevated, indicating kernel overhead related to scheduling.

The “silent killer” nature of this regression led many engineers down unproductive paths. The most common pitfall was the ‘It must be PostgreSQL’ trap. When a database instance performs poorly, the immediate inclination is to scrutinize database tuning parameters, locking contention within PostgreSQL (e.g., pg_locks), or analyze application queries for inefficiencies. Teams spent days, even weeks, re-tuning shared buffers, increasing max_connections, optimizing work_mem, and indexing tables – all to no avail, as the problem lay squarely beneath the database layer.

There were many false leads. Engineers might misinterpret I/O statistics, believing storage was the bottleneck, even though iostat or vmstat showed healthy I/O operations. Network latency could also be suspected, as application response times worsened, obscuring the fact that the CPU contention was the upstream cause. The absence of traditional I/O wait often misled teams away from a system-level CPU issue.

The fundamental challenge was the ‘zero error’ regression. When logs are clean, but performance is dire, standard debugging playbooks fall short. There were no OOM errors, no critical PostgreSQL log messages, no kernel tracebacks. This lack of explicit failure made it difficult to escalate or even pinpoint the domain of the problem, leading to frustration and burnout among engineering teams.

Furthermore, vendor-specific optimizations or default kernel configurations from cloud providers might have complicated the issue. While AWS Graviton4 was a prominent victim, other cloud providers or on-premise systems might have had custom kernel patches or different default configurations that either exacerbated the problem or, conversely, masked it for a period, making cross-environment comparisons difficult. This created a fragmented understanding of the regression’s true scope.

Finally, once the problem was identified, the ‘rolling back is hard’ reality hit hard. Reverting critical production systems to older kernel versions is no trivial task. It involves careful planning, change control, potential downtime, and risk assessment, especially in highly regulated or continuously deployed environments. The operational overhead and inherent risks of kernel rollbacks meant that even with a clear culprit, immediate mitigation was often a complex, multi-day endeavor. This delay further compounded the business impact of the performance degradation.

Beyond the Bug: Lessons Learned and Future-Proofing Your Stack

The Linux Kernel 7.0 preemption regression serves as a harsh, undeniable reminder: fundamental operating system changes, even well-intentioned ones, can silently shatter the performance contracts expected by specialized applications like databases. The kernel developers’ move to streamline preemption models was a deliberate architectural decision aimed at improving general-purpose system behavior and security for modern hardware. However, it had an unforeseen and devastating side effect on applications meticulously optimized for the previous model.

This incident underscores the imperative of cultivating ‘kernel-awareness’ within DevOps and DBA teams. Relying solely on application-level observability is no longer sufficient. Engineers need to move beyond simple CPU utilization metrics and embrace deep system-level profiling and a nuanced understanding of kernel development cycles. This includes monitoring mailing lists, understanding key kernel commits, and knowing how fundamental OS components like the scheduler interact with critical applications.

A critical lesson is the absolute necessity of dedicated performance testing with every OS kernel upgrade, not just major application releases. This means rigorous, production-like benchmarks that simulate real-world loads, not just synthetic checks. These benchmarks must be run in staging environments that mirror production as closely as possible, encompassing both current and future hardware architectures. Had such testing been universally applied before the 7.0 rollout, the issue could have been caught pre-deployment.

The regression also highlights the need for advocating for closer collaboration between kernel developers and application maintainers. The gap between those building the core OS and those optimizing complex applications atop it can lead to such blind spots. Proactive engagement, shared performance test suites, and transparent communication channels (like the Linux kernel mailing list where Dipietro posted his findings) are essential to anticipate and mitigate such regressions before they hit production. This isn’t just a “bug” in the traditional sense; it’s a design mismatch, and bridging that gap prevents future clashes.

Finally, we must consider architecting for resilience against future kernel-induced performance risks. This includes strategies like maintaining diversified environments (e.g., specific kernel versions for critical databases), leveraging advanced telemetry that correlates application performance with low-level OS metrics, and building robust rollback capabilities into deployment pipelines. Always assume that the underlying OS can change in ways that impact your application, and design your infrastructure to detect and respond to those changes quickly.

The Verdict: The Linux Kernel 7.0 preemption changes were not a bug in the traditional sense, but a deliberate architectural shift that significantly impacted applications like PostgreSQL. For users on high-vCPU ARM64 (Graviton4) and potentially other architectures running highly concurrent, spinlock-heavy PostgreSQL workloads, you absolutely must avoid Linux Kernel 7.0 unless explicit mitigations or a new patch from the PostgreSQL community or kernel maintainers has been released. If you are already running 7.0, immediately profile your PostgreSQL instances with perf to check for s_lock contention and be prepared to roll back to a stable Linux 6.x kernel. For those planning upgrades, make rigorous kernel-level performance benchmarking a non-negotiable part of your deployment process. The future of high-performance database operations demands constant vigilance at all layers of the stack.