Is Tor relay performance linearly scalable with more CPU cores?

No. Tor's single-threaded event loop limits scaling - you cannot get 8x throughput from 8 cores. In practice, 2-4 cores provide near-maximum benefit for a single Tor instance. Beyond 4 cores for a single instance, the marginal benefit of additional cores decreases significantly. Multiple Tor instances scales better: 8 instances across 8 cores can provide approximately 8x the circuit-setup throughput of a single instance.

What is the maximum throughput of a single Tor relay instance?

Real-world high-bandwidth Tor relays have reached 4-8 Gbit/s sustained on high-core-count servers with modern CPUs. The practical ceiling for a single Tor process with current code is approximately 1-3 Gbit/s on a single high-frequency core. Multiple instances or parallel architectures can exceed this. Most VPS plans with 1-10 Gbit/s ports are in this range - a single Tor instance can saturate most VPS bandwidth allocations.

Should I run multiple Tor instances or a single instance on a multi-core VPS?

For a VPS with 2-4 CPU cores: single instance is fine and simpler to manage. For 8+ CPU cores: multiple instances can better utilize the hardware. The break-even point depends on traffic load - at high traffic (relay is consistently CPU-saturated), multiple instances help. At moderate traffic (relay is not CPU-bound), a single instance is sufficient and simpler.

Does Tor's Conflux protocol help multi-core utilization?

Conflux (multi-path TCP extension for Tor) allows splitting a single stream across multiple circuits, potentially using multiple CPUs for different circuit paths. As of 2026, Conflux is deployed in the Tor network but primarily benefits clients (allowing them to use multiple circuits simultaneously). Relay-side benefits from Conflux are indirect - more circuits from more clients increases load that can be distributed across cpuworker threads.

How do I benchmark my relay's CPU performance?

Use tor --list-fingerprint to verify Tor starts, then test crypto performance with openssl speed aes-256-ctr (measures AES throughput with AES-NI) and openssl speed -evp x25519 (measures X25519 performance used in circuit setup). Compare to expected performance: AES-256-CTR with AES-NI should reach 2-10 Gbit/s; X25519 should perform thousands of operations per second per core. Significant deviation below expected performance suggests AES-NI is not active or the CPU is throttling.

Multi-Core Tor Relay Configuration: Scaling Across CPU Cores

Tor is historically single-threaded for core routing operations, but modern Tor versions (0.4.5+) have significantly improved multi-core utilization through parallel cryptographic workers and the Conflux multi-path protocol. Understanding how Tor uses multiple CPU cores and how to configure your relay to maximize multi-core performance allows relay operators to get full value from modern multi-core VPS and server hardware. This guide covers Tor's threading model, the NumCPUs configuration, cryptographic worker threads, and complementary strategies for multi-core relay performance.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Tor's Threading Architecture

Tor's main event loop (handling circuit operations, scheduling, and routing decisions) runs in a single thread for correctness and simplicity. However, cryptographic operations (the most CPU-intensive work for high-bandwidth relays) are offloaded to worker threads. The cpuworker subsystem handles: ntor handshake computation (X25519 + SHA3 operations for circuit setup), create cell processing (the Tor-side of circuit key exchange), and TAP handshakes (legacy, less common). These operations are parallelized: each cpuworker thread processes independently, allowing multiple circuit setups to proceed simultaneously. The main thread remains the single-threaded event loop but is not the bottleneck for high-bandwidth relays since its work (routing already-established circuit data) is fast. The bottleneck for very high bandwidth relays is AES encryption throughput in the main event loop - this is not parallelized in current Tor versions.

NumCPUs Configuration

The NumCPUs configuration option in torrc controls the number of cpuworker threads Tor spawns: NumCPUs 0 (default) - auto-detect and use all available CPU cores. NumCPUs 4 - use exactly 4 cpuworker threads. For most deployments, NumCPUs 0 (auto-detect) is correct. Over-provisioning cpuworkers (setting NumCPUs higher than available cores) can cause context switching overhead. Under-provisioning (setting it lower than available cores) leaves CPU capacity unused. Verify the correct value: check actual CPU core count (nproc) and set NumCPUs to match. For hyperthreaded CPUs: set NumCPUs to the number of physical cores (not logical threads) if performance testing shows that hyperthreading does not improve throughput. For ARM CPUs: Cortex-A53 has 2 shared L2 cache groups per 4 cores; setting NumCPUs to physical core count is appropriate.

Measuring Multi-Core Utilization

Monitor multi-core CPU usage while the relay is running: mpstat -P ALL 1 shows per-core CPU utilization per second. If all cores show near-100% utilization, the relay is CPU-saturated and needs more cores or reduced bandwidth. If cores beyond the first are lightly loaded (below 50%), either the relay is not heavily loaded, the parallelism is limited by the single-threaded event loop, or NumCPUs should be adjusted. For a relay with 4 CPU cores, expect to see: core 0 (main event loop) at 50-80% during high traffic, cores 1-3 (cpuworkers) at varying levels based on circuit setup rate. If cores 1-3 are consistently at near-100% while core 0 is below 100%, circuit setup rate is the bottleneck - consider increasing NumCPUs or reducing circuit timeout to clear stalled circuits faster.

Multiple Tor Instances for High-Core-Count Servers

For servers with 8+ CPU cores, running multiple Tor relay instances on the same server can more effectively utilize available cores than a single Tor instance (due to the single-threaded event loop limitation). Each Tor instance has its own ORPort, fingerprint, and bandwidth configuration. Configure in separate torrc files: /etc/tor/instances/relay1/torrc, /etc/tor/instances/relay2/torrc, each with different ORPort (9001, 9002, etc.) and separate data directories. Use systemd template units for clean management. The Tor network counts each instance separately - running 4 Tor instances provides 4 relays worth of network contribution from one server. Important: each instance must have a different ORPort and DataDirectory. They share the same Tor guard rules (no two instances can be in the same circuit for the same user).

Kernel-Level Optimizations for Multi-Core Relays

The Linux kernel's network stack can be a bottleneck for high-bandwidth multi-core Tor relays. Key optimizations: RSS (Receive-Side Scaling) - distribute network interrupts across multiple CPU cores: ethtool -L eth0 combined 4 (for a 4-core system, set 4 combined queues). RPS (Receive Packet Steering) - software-level packet distribution when hardware RSS is unavailable: echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus (distribute rx queue to all 4 cores). NAPI polling - most modern NIC drivers use NAPI, reducing interrupt overhead. XDP (eXpress Data Path) - for the highest bandwidth relays, XDP can process packets at the NIC driver level bypassing the kernel network stack. TCP socket buffer tuning for high-bandwidth connections: sysctl -w net.core.rmem_max=134217728 and net.core.wmem_max=134217728.

İlgili Hizmetler

Offshore VPS from $19.99/mo Offshore VPS Locations Global VPS from $29.99/mo Adanmış Sunucular Compare Plans by Jurisdiction DevOps Services

Privacy & anti-censorship guides

Tor in Russia 2026 Tor obfs4 Bridges Guide

Why Anubiz Host

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

İlgili Makaleler

İlgili

Bulletproof Hosting Providers

İlgili

DMCA-Ignored Servers

İlgili

Offshore VPS from $19.99/mo

İlgili

Anonymous Hosting Solutions

İlgili

Tor in Russia 2026: Working Bridges

İlgili

Tor obfs4 Bridges Guide

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Iceland VPS II

Multi-Core Tor Relay Configuration: Scaling Across CPU Cores

Tor's Threading Architecture

NumCPUs Configuration

Measuring Multi-Core Utilization

Multiple Tor Instances for High-Core-Count Servers

Kernel-Level Optimizations for Multi-Core Relays

İlgili Hizmetler

Privacy & anti-censorship guides

Why Anubiz Host

İlgili Makaleler

Ready to get started?

Anubiz Chat AI