Gbuck12DocsLinux & DevOps
Related
Linux 7.2 Kernel Update: 'Fair' DRM Scheduler and AMD AIE4 Hardware Integration ComingLinux 7.1 Merge Window Opens with Major Kernel Updates6 Key Facts About Linux Mint's HWE ISOs and Why They MatterHow Meta Harnesses AI Agents to Drive Hyperscale EfficiencyHow Meta's AI Agents Drive Hyperscale Efficiency: Q&ABoosting Hyperscale Efficiency with AI Agents at Meta5 Fascinating Facts About Ubuntu's Unusual Codename: Stonking StingrayUbuntu's Twitter Hijacked in Multi-Stage Crypto Scam Following Sustained DDoS Attack

How to Supercharge Your Linux Per-Core I/O Performance by 60%: A Step-by-Step Guide Inspired by Jens Axboe's Latest Patches

Last updated: 2026-05-11 02:06:24 · Linux & DevOps

Introduction

At the recent Linux Storage, File-System, Memory Management, and BPF Summit (LSFMM) in Croatia, a presentation highlighted the I/O overhead of Linux compared to the Storage Performance Development Kit (SPDK). This sparked Jens Axboe, the lead IO_uring developer and Linux block maintainer, to dive into optimizations. His resulting patches delivered an impressive ~60% increase in per-core I/O performance. This guide walks you through the process—from understanding the problem to implementing and testing similar enhancements on your own system.

How to Supercharge Your Linux Per-Core I/O Performance by 60%: A Step-by-Step Guide Inspired by Jens Axboe's Latest Patches

What You Need

  • A Linux development machine (preferably with a recent kernel source, e.g., 6.x)
  • Basic familiarity with Linux kernel compilation and command-line tools
  • Installation of necessary development packages: build-essential, libncurses-dev, bison, flex, libssl-dev, and git
  • Access to the latest kernel source code (clone from git.kernel.org or download a tarball)
  • Benchmarking tool: fio (Flexible I/O Tester) for measuring per-core performance
  • Knowledge of IO_uring and the block layer (helpful but not strictly required)
  • Patience and a test environment (do not apply unfinished patches on production machines)

Step-by-Step Guide

Step 1: Identify the I/O Overhead Bottleneck

Before optimizing, understand where the overhead lies. Review presentations or documentation that compare Linux I/O performance with SPDK. Common bottlenecks include lock contention, syscall overhead, and inefficient memory management. Axboe’s work focused on reducing per-IO overhead in the block layer and IO_uring paths. For your own analysis, use tools like perf and trace-cmd to capture kernel traces during heavy I/O workloads.

Step 2: Set Up Your Development Environment

  1. Clone the Linux kernel source tree from the official repository:
    git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  2. Install required build dependencies. For Debian/Ubuntu:
    sudo apt-get install build-essential libncurses-dev bison flex libssl-dev
  3. Configure the kernel. Start with a baseline configuration (e.g., make defconfig) and ensure IO_uring support is enabled (CONFIG_IO_URING=y).

Step 3: Find and Apply the Performance Patches

Axboe’s patches are typically submitted to the Linux Kernel Mailing List (LKML) or available in the io_uring development branch. To replicate the 60% gain, look for series titled like “per-core IO improvements” or similar. Steps:

  • Search LKML archives or the maintainer’s git tree.
  • Download the patch series (e.g., git format-patch from a working branch).
  • Apply patches on top of your kernel source: git am *.patch.
  • Resolve any conflicts manually if they occur.

Step 4: Compile and Install the Custom Kernel

  1. Build the kernel and modules: make -j$(nproc)
  2. Install modules: sudo make modules_install
  3. Install the kernel image: sudo make install
  4. Update bootloader (e.g., update-grub) and reboot into the new kernel.

Step 5: Benchmark Per-Core I/O Performance

Use fio to measure single-core I/O throughput. Example command for random reads with IO_uring:

fio --name=test --ioengine=io_uring --rw=randread --bs=4k --numjobs=1 --size=1G --runtime=30 --time_based --group_reporting

Run the same benchmark on the baseline kernel (without patches) and the patched kernel. Compare the IOPS (I/O operations per second) and latency percentiles.

Step 6: Analyze and Iterate

If your results don’t show a ~60% improvement, investigate:

  • Check kernel config differences (ensure no debugging options that slow down I/O).
  • Use perf top while running fio to identify remaining hot spots.
  • Try different patch versions or additional optimizations from Axboe or other developers.

Tips for Success

  • Test on a non-critical system – these patches are cutting-edge and may have stability issues.
  • Use the exact same hardware and workload for before/after comparisons to avoid variables.
  • Watch the LKML and IO_uring mailing list for evolved patches, as Axboe often posts updated versions.
  • Consider enabling kernel debug options initially to catch any regressions, then disable for performance runs.
  • Document each patch and its effect to contribute back to the community if you build on the work.
  • Understand the trade-offs – the patches may increase per-core performance at the cost of slightly higher memory usage or complexity.

Conclusion

By following these steps, you can harness the same optimizations that Jens Axboe developed to boost per-core I/O performance by up to 60%. Remember that kernel development is iterative; your mileage may vary depending on your hardware and workload. Stay engaged with the open-source community to get the latest improvements and contribute your findings.