Gaurav Sarma

"Protobuf is smaller than JSON" is one of those facts everyone knows. It shows up in design docs as a one-liner: switch from JSON to protobuf, save bytes on the wire. The benchmarks behind it are real. Protobuf payloads are routinely 50 to 90 percent smaller than the equivalent JSON.

The thing is, almost no one ships raw JSON. They ship gzipped JSON. Or zstd. Or brotli. And the moment you put a general-purpose compressor in front of either format, the picture changes. JSON has predictable, repetitive structure (the same keys appear over and over), which is exactly what compressors are good at squeezing out. So the interesting question is not "is protobuf smaller than JSON" but "is compressed protobuf smaller than compressed JSON, and when?"

This post walks through a benchmark that pokes at that question across 19 payload shapes, three compressors, and the spread of cases (tiny records, sparse optionals, long lists, packed ints, UUIDs, binary blobs) where the answer flips.

The Problem

If you are designing a wire format for an internal service, the standard bake-off looks something like:

Pick a representative payload.
Serialize it as JSON. Run gzip on it. Measure.
Serialize it as protobuf. Run gzip on it. Measure.
Pick whichever is smaller.

Step 4 is where it gets interesting, because the answer is not consistent. A protobuf message that is 60 percent smaller than its JSON twin in raw bytes might come out only 1 percent smaller after gzip, or even larger. The savings you measured at the schema level can mostly evaporate at the wire level.

What I wanted was a single picture: across the kinds of payloads that real APIs actually carry, where does compressed protobuf still win, and where does compressed JSON catch up? The answer turns out to be neither "always protobuf" nor "always JSON" but a clean set of patterns you can predict from the shape of the data.

Prerequisites

To follow along you'll want:

Python 3.10+
protobuf, brotli, zstandard from PyPI
protoc to regenerate schema_pb2.py if you change schema.proto

The code lives at github.com/gsarmaonline/proto-vs-json-compression. You can run the whole thing in a few seconds with:

.venv/bin/python bench.py

Technical Decisions

A benchmark like this lives or dies on the choices you make at the edges. Here are the ones that mattered.

Comparing apples to apples means generating the data twice

The most common mistake in proto-vs-JSON benchmarks is to build the protobuf message, then call MessageToJson to produce the JSON. That is not a fair comparison. MessageToJson makes choices (camelCase field names, omitting unset fields, base64 for bytes) that are protobuf's defaults, not JSON's defaults. Real JSON APIs frequently differ.

So in this benchmark, every scenario builds the protobuf message and the JSON dict from the same source data, independently. That lets us probe a real-world question: when JSON keeps null for unset fields versus when it omits them, how does the comparison change? Spoiler: a lot. More on that below.

Including raw bytes and three compressors

The columns in the result table are raw, gzip (level 9), zstd (level 19), and brotli (quality 11). Raw bytes show the schema-level savings (which are huge and uncontroversial). The three compressors at high settings show what you'd actually ship.

I picked aggressive settings on purpose. The "compressed JSON catches up" effect is strongest when the compressor has the most room to work, so this is the steel-man case for JSON. If protobuf still wins at zstd-19, it's a real win.

Deterministic data

Every random draw uses a seeded RNG (random.Random(42)), so the numbers in the table are reproducible. This matters because some scenarios (UUID lists, binary blobs) are sensitive to entropy, and you don't want to chase a 0.3 percent flutter that's just a different RNG seed.

Implementation

The schema

schema.proto defines 11 message types, each picked to stress a different axis of the proto-versus-JSON comparison:

// Numeric-heavy — many ints/doubles, no repetition.
// Protobuf varints + fixed-width doubles vs JSON text numbers.
message NumericHeavy {
  uint64 id = 1;
  uint64 ts_ms = 2;
  double cpu = 3;
  // ... 12 more numeric fields
}

// Sparse optionals — many fields, most unset.
message SparseOptionals {
  uint64 id = 1;
  optional string a = 2;
  optional string b = 3;
  // ... 19 more optional fields
}

// Packed int array — protobuf packs into a single tag+length.
message IntArray {
  repeated uint32 values = 1;     // packed by default in proto3
}

The full schema covers small flat objects, numeric-heavy records, long natural-language strings, sparse optionals, deeply nested trees, lists of homogeneous records, low-cardinality enum-like strings, boolean bags, packed integer arrays, UUID lists, and binary blobs.

The runner

The runner is straightforward: for each scenario, serialize as JSON (with json.dumps(..., separators=(',', ':')) for compact output) and as protobuf, then run each through three compressors and record sizes.

def measure(name: str, msg, d) -> Row:
    js = json.dumps(d, separators=(",", ":")).encode("utf-8")
    pr = msg.SerializeToString()
    return Row(
        name=name,
        json_raw=len(js),
        json_gz=gzip_size(js),
        json_zstd=zstd_size(js),
        json_br=brotli_size(js),
        proto_raw=len(pr),
        proto_gz=gzip_size(pr),
        proto_zstd=zstd_size(pr),
        proto_br=brotli_size(pr),
    )

The delta column is (proto - json) / json. Negative means protobuf is smaller. That sign convention takes a second to internalize, but it's the right one once you start scanning columns.

Sparse optionals: the most lopsided case

The single most dramatic result is sparse_optionals_dense_json. The scenario: a 21-field message where only 3 fields are set. Two JSON variants: one that emits null for unset keys (common in untyped or auto-generated APIs), and one that omits them entirely.

# Variant 1: dense — emit all keys with nulls.
d_dense = {"id": 1, "a": "hello", "b": None, "c": None, ...}

# Variant 2: sparse — drop unset keys.
d_sparse = {"id": 1, "a": "hello", "x": 42}

Protobuf is identical in both cases (it just doesn't write the unset fields), at 11 raw bytes. Dense JSON is 213 raw bytes. After gzip, dense JSON is 104 bytes, protobuf is 31. That's a 70 percent win even after compression, on a payload where you might naively think compression should crush all the repetition. It doesn't, because nullness is structurally encoded in JSON keys, and short distinct keys don't compress well.

Sparse JSON cleans this up nicely (47 bytes raw, 47 bytes after gzip, since gzip can't help such a short input) but you only get sparse JSON if your serializer is configured to omit unset fields, which not every codebase does.

Lists: where JSON catches up and sometimes wins

The other end of the spectrum is repeated_orders. A list of homogeneous records. Each record has six fields. JSON repeats the field names per element. In raw bytes, protobuf is roughly 63 percent smaller across all sizes. After gzip:

Scenario	jsonGZ	protGZ	ΔGZ
repeated_orders×10	293	267	−8.9%
repeated_orders×100	1639	1690	+3.1%
repeated_orders×1000	14533	14426	−0.7%

At 100 elements, gzipped JSON is smaller than gzipped protobuf. Why? Because gzip is exceptionally good at finding the repeated key sequences ("order_id":, "user_id":, etc.) and replacing them with backreferences. Once you have enough repetitions, the keys cost almost nothing. Protobuf's per-element overhead (tags, lengths) doesn't compress nearly as well, because it's already information-dense.

At 10 elements there's not enough repetition for gzip to win. At 1000 elements, the long tail of varint-encoded fields (which compress modestly) tips it back to protobuf. The crossover is in the middle, and it's narrow.

Packed int arrays: protobuf's home turf

Compare to int_array×1000 with small values:

Format	Raw	Gzip
JSON	3142	1374
Protobuf	1003	929

Protobuf packs uint32 values into a tightly varint-encoded sequence with a single tag and length prefix. Each small value takes 1 byte. JSON spells each integer in decimal with a comma separator, which is verbose to begin with and only partly redeemed by gzip (numbers are not as repetitive as field keys). Protobuf wins by 32 percent after gzip, which is one of the larger compressed wins in the whole benchmark.

Where neither wins meaningfully

string_heavy (one long natural-language body) and uuid_list (high-entropy strings) are nearly tied. The framing overhead is negligible compared to the payload content, so both formats compress to roughly the same size. If your data is mostly text content or random IDs, the format choice barely matters for size.

How It All Fits Together

After running all 19 scenarios, the picture sorts cleanly into three buckets.

Compressed protobuf still wins (often substantially) when:

The payload is small (per-message overhead dominates and protobuf has none).
Most fields are unset and the JSON serializer keeps null placeholders.
You have packed numeric arrays (protobuf's varint + packed encoding is genuinely tighter than text numbers).
You have many booleans (1 byte each in protobuf vs "key":true, in JSON).
You have low-cardinality enum-like strings that protobuf encodes as small ints.

Compressed JSON catches up or wins when:

You have long lists of homogeneous records and the per-record key repetition compresses to almost nothing.
The payload is dominated by opaque bytes (JSON's base64 overhead, ~33 percent, mostly survives compression but so does protobuf's framing, so they roughly tie).

Neither wins meaningfully when:

The payload is dominated by natural-language text or high-entropy strings (UUIDs). The framing is a rounding error.

The mental model: protobuf wins where structure is the dominant cost. JSON catches up where structure is repetitive (compressors love repetition) or where structure is a small fraction of the payload (the data dominates).

Lessons Learned

A few things surprised me, even after staring at the numbers for a while.

The "compressed JSON beats compressed protobuf" case is real, but narrow. It happens around 100 repeated homogeneous records. Below that, not enough repetition. Above that, varint efficiency pulls protobuf back ahead. If you knew you'd always be in that band, you could ship gzipped JSON and feel fine. Most APIs are not that uniform.

JSON's worst case is not what people think. The dramatic 70+ percent wins for protobuf don't come from nested structures or numeric efficiency. They come from sparse optionals with null placeholders. If your team uses a JSON serializer that emits nulls for missing fields, that single choice is doing more damage to your wire size than the format itself.

zstd and brotli rarely change the qualitative answer. They consistently produce smaller outputs than gzip for both formats, but the relative comparison stays roughly the same. If gzipped protobuf wins, zstd protobuf wins by a similar margin. The exception is repeated_orders×100, where zstd gives JSON a bigger lead than gzip does.

The case for protobuf is rarely "wire size" alone. Once you put a compressor in the path, the bytes-on-the-wire savings shrink to roughly 0 to 35 percent for typical payloads. That's not nothing, but it's not the order-of-magnitude win the raw-bytes column suggests. The stronger arguments for protobuf are schema enforcement, generated client code, faster parsing, and the absence of weird edge cases (no NaN-vs-null debate, no integer-precision-loss-in-JavaScript). Wire size is a tiebreaker, not the headline.

What's Next

A few directions worth poking at:

Streaming. All measurements here are one-shot. For long-lived streams (e.g., NDJSON-style logs), the compressor's dictionary state across records changes the math, and protobuf's per-record overhead becomes more visible.
Schema evolution costs. Protobuf's reserved field numbers and unknown-field handling have on-the-wire costs that this benchmark doesn't capture. JSON's "just add a key" approach has its own, in the form of unbounded growth.
Trained dictionaries. zstd supports trained dictionaries that dramatically improve compression on small payloads. JSON benefits more than protobuf from this, because the keys are exactly what the dictionary captures. That could shift the small-message numbers significantly.

When Does Compressed Protobuf Actually Beat Compressed JSON?