Hardware capabilities have advanced dramatically, with PCIe bandwidth doubling roughly every three years, reaching 32 GB/s per channel in PCIe 7.0, high-bandwidth memory delivering hundreds of GB/s, and modern CPUs featuring wider SIMD units capable of processing dozens of bytes per instruction. Yet many software tasks, including JSON parsing, remain CPU-bound and far slower than these interfaces allow. This presentation explores how SIMD instructions enable gigabyte-per-second throughput in real-world data processing. Focusing on the simdjson library, we examine its design for fast structural scanning, on-demand parsing, and minification, along with recent optimizations leveraging C++26 compile-time reflection for efficient serialization and vectorized string escaping. We extend the discussion to related challenges in Unicode validation and correction (as deployed in browsers) and high-speed Base64 encoding/decoding in upcoming JavaScript standards. Through benchmarks on platforms, we demonstrate how these techniques harness modern hardware to deliver orders-of-magnitude speedups, powering systems from Node.js and ClickHouse to web browsers worldwide.
Daniel Lemire is a computer science professor at the University of Quebec (TELUQ). He is among the 1000 most followed programmers in the world on GitHub. His work is found in many standard libraries (.NET, Rust, GCC/glibc++, LLVM/libc, Go, Node.js, etc.) and in the major Web browsers (Safari, Chrome, etc.). His research interests include high-performance programming. He is @lemire on X, and he blogs weekly at https://lemire.me/blog