Cloud NAC performance test: 10,000 RADIUS auths/min on production hardware

We ramped a single Arbiter tenant from 1,000 to 10,000 RADIUS authentications per minute on the production VM pair, then sat at 2,000/min for four hours. Everything held. Arbiter's target customer is SMEs and the MSPs supporting them: a 2,000-endpoint tenant's worst minute lands around 500/min, so 10,000/min is roughly 20x deliberate overshoot. Here's what we measured.

Two tests this week on the production VM pair. First we ramped a single tenant from 1,000 to 10,000 RADIUS authentications per minute. Then we sat at 2,000/min for four hours to see if anything drifted. Two tenants ran concurrently throughout, both on real per-tenant PKI, both seeing the traffic mix a customer generates: EAP-TLS, MAC-Auth-Bypass, accounting on every permit, reject paths for bad certs and unknown MACs.

Arbiter is built for SMEs and the MSPs that run them, tenants between 50 and 2,000 endpoints whose busiest minute of the year tends to land around 300 to 500 authentications. We tested at 20x that ceiling on purpose: a headroom claim is only useful when there's enough gap that bad days disappear into the noise.

10,000 auths/min sustained, 100% policy accuracy. p99 latency 1,394 ms at the ceiling (well under the 2-second pass threshold). Zero queue growth, flat FreeRADIUS memory, no leaks. Two tenants concurrent on the same hardware with no resource contention.

What we're not claiming

Three caveats up front, before the numbers, because credibility on a performance post is built on what you don't claim:

Test one: ramp to the ceiling

Five 30-second tiers at constant rate: 1,000, 2,000, 5,000, 7,500 and 10,000 auths/min. Pass criteria: at least 95% verdict accuracy per class, p99 latency under 2 seconds, no internal queue overflow. All five tiers passed with 100% accuracy: every permit permitted, every reject rejected, every Acct-Start acknowledged. Zero unexpected accepts on the bad-cert path.

Harness specifics

Sustained authentication rate by tier. Target shown as dashed line, actual sustained rate as the bar. Every tier passed.
FreeRADIUS deliberately sleeps for one second on every Access-Reject as a credential-stuffing brake. The ~1.1-second floor in the chart below is that pause, not server processing time. Subtract a second to read real per-request work, which is in the low-millisecond range.

p99 latency climbed from 1.06s at the bottom tier to 1.39s at the top, well under the 2-second threshold. Server-side, every internal queue stayed under 1% of capacity through the whole ramp and FreeRADIUS resident memory finished where it started.

p99 latency by tier. The 2,000 ms ceiling is our pass threshold; the line that matters is well under it.

What this looks like at SME scale

The chart below puts realistic SME tenants next to the tested ceiling. Loads are peak-burst (the boot-storm Monday or post-policy-push reauth wave), not steady-state. Average daily load sits in fractions of a percent.

Worst-minute (boot-storm / mass re-auth) load for realistic SME tenants against the 10,000 auth/minute tested ceiling. Not average daily load: the busiest 60 seconds of the year.

A 2,000-endpoint SME at peak burst sits at about 5% of capacity. The 500-endpoint case is under 2%. That's per tenant, and the two concurrent tenants in this run shared the FreeRADIUS process, the database pool and the accounting writer without competing for any of them.

Test two: four hours at the wheel

Ramping is one thing. Sitting at a number for hours catches the bugs the ramp can't: memory leaks, slow queue drift, caches that grow and don't evict. We set the rate to 2,000 auths/min (20% of the proven ceiling), cleared the tables and let it run for four hours: twenty-four 10-minute windows, same mix as the ramp, no warm restart.

Result: 465,367 authentications and 185,369 accounting writes. Every window held 100% policy accuracy. p99 latency stayed between 1,098 and 1,124 ms across the whole run, a drift ratio of 0.98x first-to-last (the last window was fractionally faster than the first). FreeRADIUS memory finished flat.

Two windows about twenty minutes in did trip the harness's pass criteria with 52% and 21% timeout rates. We dug in: FreeRADIUS received only 11,000 and 14,000 requests in those windows instead of 20,000, and the journal had no entries at all for that period. The packets never arrived. The cause was packet loss on the test client's own domestic uplink, not anything on the Arbiter side. The detail that matters is what happened next: when traffic resumed, p99 was 1,111 ms, identical to the first window. No backlog, no recovery curve, no drift. We kept the FAIL in the record because the more useful result is what the server did when its load dropped out and came back: nothing.

p99 latency across twenty-four 10-minute windows of the four-hour soak. The 2,000 ms pass-threshold sits near the top of the chart; the actual line barely moves. The shaded band marks the two windows where the test-client internet uplink dropped packets (see text): server-side latency did not budge.

What the database did

The engineering that made it possible

Four small changes landed in the two days before the test. Each one shifted the curve more than its diff size would suggest.

How to reproduce

The harness is a multi-process Python driver using pyrad for plain RADIUS and eapol_test for EAP-TLS, fanning out across simulated NAS-Identifier values against any Arbiter tenant. We'll share the harness, the cert helpers and a short README with serious evaluators on request: email support@arbiter.ie.

What's next

The single-tenant ceiling is on the record. The interesting unanswered question is the shared layers: database pool, accounting writer, audit pipeline. So the next test is a fan-out: ten concurrent tenants at 5,000 auths/min each for thirty minutes (50,000/min aggregate, well past any plausible MSP-managed load), each on its own per-tenant virtual server, PKI and policy chain. Same pass criteria. We'll pair it with an EAP-TLS-heavy mix on at least one tenant, since the crypto path deserves its own honest number. Results in the next devlog.

See it for yourself

Spin up a free trial tenant on app.arbiter.ie and point any RADIUS test client at it. Want our reference stress-test script (the same one that produced the numbers above)? Email support@arbiter.ie and we will send you the code, the cert helpers and a short README. No smoke and mirrors.

Start a free trial

Read the docs