I just spent a very entertaining debugging session tracking down why my Rust program was taking 10+ minutes to do something that Python does in 0.15 seconds. The culprit? A single missing BufReader wrapper. This is a well-known footgun in the Rust ecosystem, but experiencing it firsthand really drives home how brutal it is.
I've been hacking on agent-history, a TUI for searching through AI coding session history across Codex, Claude Code, and OpenCode. Part of what it does is index OpenCode sessions, which are stored as a tree of small JSON files on disk: one session file, then a directory of message files, then for each message a directory of part files. A single session with a decent conversation might have 100+ messages and 400+ parts, each a small JSON file.
The indexing code looked like this:
let message: OpenCodeMessage = serde_json::from_reader(File::open(&message_file)?)?;
And for part files:
let part: OpenCodePart = serde_json::from_reader(File::open(&part_file)?)?;
Perfectly idiomatic-looking Rust. Compiles without warnings. Works correctly. And is catastrophically slow.
Indexing 565 OpenCode sessions (containing ~65,000 message files and ~220,000 part files) was taking 3 minutes wall time, burning 1,355 seconds of system CPU across 10 cores. The events telemetry I added showed individual sessions taking 100+ seconds. One session with just 113 messages and 461 parts (30MB total) took 103 seconds.
Meanwhile Python could open, read, and parse every one of those 574 JSON files in 0.15 seconds.
That's not a 2x difference. Not 10x. It's roughly 700x slower than Python. In Rust. A language we choose specifically for performance.
serde_json::from_reader() reads from any type implementing std::io::Read. It pulls bytes from the reader one at a time to feed its streaming JSON tokenizer. This is by design -- it's a streaming parser that doesn't need to buffer the entire input.
The problem: File implements Read, but each .read() call on a raw File is a kernel syscall. When serde asks for one byte, the OS dutifully context-switches into the kernel, reads one byte from the file buffer, and context-switches back.
So for a 30MB session's worth of JSON files, we were making roughly 30 million read(fd, buf, 1) syscalls. Each one costs a few microseconds of overhead. At 3 microseconds per syscall, 30 million of them takes 90 seconds. Mystery solved.
One line change:
// Before: ~30 million syscalls per session
let message: OpenCodeMessage = serde_json::from_reader(File::open(&path)?)?;
// After: ~4,000 syscalls per session
let message: OpenCodeMessage = serde_json::from_reader(BufReader::new(File::open(&path)?))?;
BufReader wraps the File and maintains an 8KB in-memory buffer. When serde asks for one byte, BufReader serves it from its buffer. When the buffer is empty, it refills with a single read(fd, buf, 8192) syscall. So instead of 30 million syscalls, you get maybe 4,000. The overhead vanishes.
It compiles clean. No warnings, no clippy lints (at least not by default). File implements Read, from_reader accepts Read, everything type-checks.
It produces correct results. You will never get a wrong answer. Your tests will pass. The only symptom is that your program is inexplicably slow.
It looks idiomatic. serde_json::from_reader(File::open(&path)?) reads like perfectly reasonable Rust. You'd have to already know about this footgun to spot it in code review.
The performance hit scales with data volume, not code complexity. For small files or a handful of them, you'll never notice. It only becomes catastrophic when you're opening thousands of files, which is exactly when you're probably not looking at individual file-reading lines anymore.
Profiling points at the kernel, not your code. The time shows up as system CPU, not user CPU. If you're looking at flamegraphs, you'll see a wall of read syscalls and might conclude "I/O is the bottleneck" rather than "I'm doing I/O wrong."
To be fair, the serde_json::from_reader documentation says:
Performance
When reading from a source against which short reads are not efficient, such as a
File, you will want to apply your own buffering...
But let's be honest. You see from_reader, you have a reader, you pass it in. You don't read the performance notes on every function in a library you've used a hundred times. And the API happily accepts the unbuffered reader without complaint.
The Rust ecosystem could address this at several levels:
serde_json::from_reader(File::open(...)) without BufReader would catch the most common case. This pattern is greppable and unambiguous.from_reader would eliminate the footgun for the vast majority of callers while preserving the streaming semantics. The memory cost is negligible.File. This is a broader issue than just serde -- any code doing byte-at-a-time reads on a raw File is almost certainly a bug.What made this fun to track down was the layers of misdirection. The initial symptom was "indexing is slow." The first theory was that some OpenCode session files were huge (and some were -- we found a 467MB JSON file with full file snapshots stored inline, which was a separate bug). After sanitizing those, it was still slow. The telemetry showed 285,000 file opens taking minutes, but the total data was only ~265MB -- well within what an SSD can serve in a second or two.
The smoking gun was timing the exact same workload in Python. When a Python script that opens 574 files, parses them all as JSON, and finishes in 0.15 seconds is 700x faster than your Rust program doing the same thing, you know the problem isn't algorithmic. It's mechanical. Something in the plumbing is spectacularly wrong.
And it was. Four missing BufReader::new() wrappers across the codebase. That's it.