What is a realistic response time target for a .onion service?

Given Tor's inherent latency (200-500ms circuit overhead plus 3-5s initial circuit build), the application layer should add <100ms for most requests. End-to-end response time experienced by users: 300-700ms for subsequent requests on an established circuit. 3-8 seconds for the first request (circuit building). Optimizing the application layer from 500ms to 50ms changes the user experience from 700ms to 250ms - a significant improvement. Optimizing Tor's circuit overhead is outside operator control.

How do I load test a .onion service?

Load testing tools must be configured to use Tor's SOCKS5 proxy to actually route through Tor. locust: configure requests session with a SOCKS5 proxy (pip install requests[socks], use SocksAdapter). k6: use --http-debug and configure a SOCKS5 proxy via environment variables. wrk: does not natively support SOCKS5 - use wrk2 with haproxy as a SOCKS5 gateway. Alternatively, load test the application backend directly (bypassing Tor) to isolate application performance from Tor network latency, then separately measure Tor's contribution to total latency.

Should I enable HTTP/2 for a .onion service?

HTTP/2 provides performance benefits for clearnet sites: multiplexing (multiple requests on one connection), header compression, and server push. For .onion services, Tor circuits already multiplex streams, reducing HTTP/2's multiplexing benefit. HTTP/2 header compression (HPACK) does reduce bandwidth for request-heavy applications. Enable HTTP/2 if using HTTPS .onion (required for HTTP/2); for HTTP .onion, HTTP/1.1 keepalive is the practical option.

Does CDN caching work for .onion services?

Traditional CDN services (Cloudflare, Fastly) serve clearnet only - they cannot sit in front of a .onion service. Self-hosted caching proxies (Varnish Cache, Nginx proxy_cache) deployed as an additional layer between the .onion Tor process and the application backend serve as in-path caches. Configuration: Tor -> Nginx (with proxy_cache for static assets and cacheable responses) -> application server. This reduces application server load for cacheable content without a third-party CDN.

What causes the 'first request is slow' phenomenon on .onion sites?

The first request on a new Tor circuit includes: circuit establishment time (Tor building the 6-hop circuit: 3-5 seconds), hidden service descriptor lookup (Tor fetching the service's introduction points from HSDir relays: 1-3 seconds on first visit), and rendezvous negotiation. Subsequent requests on the same circuit avoid these setup costs and are much faster. This is inherent to Tor's protocol. Users should be informed to expect a slow initial connection - show a loading indicator in the application UI.

Performance Profiling for Tor Hidden Services: 2026 Guide

Q: How do I load test a .onion service?

Load testing tools must be configured to use Tor's SOCKS5 proxy to actually route through Tor. locust: configure requests session with a SOCKS5 proxy (pip install requests[socks], use SocksAdapter). k6: use --http-debug and configure a SOCKS5 proxy via environment variables. wrk: does not natively support SOCKS5 - use wrk2 with haproxy as a SOCKS5 gateway. Alternatively, load test the application backend directly (bypassing Tor) to isolate application performance from Tor network latency, then separately measure Tor's contribution to total latency.

Q: Should I enable HTTP/2 for a .onion service?

HTTP/2 provides performance benefits for clearnet sites: multiplexing (multiple requests on one connection), header compression, and server push. For .onion services, Tor circuits already multiplex streams, reducing HTTP/2's multiplexing benefit. HTTP/2 header compression (HPACK) does reduce bandwidth for request-heavy applications. Enable HTTP/2 if using HTTPS .onion (required for HTTP/2); for HTTP .onion, HTTP/1.1 keepalive is the practical option.

Q: Does CDN caching work for .onion services?

Traditional CDN services (Cloudflare, Fastly) serve clearnet only - they cannot sit in front of a .onion service. Self-hosted caching proxies (Varnish Cache, Nginx proxy_cache) deployed as an additional layer between the .onion Tor process and the application backend serve as in-path caches. Configuration: Tor -> Nginx (with proxy_cache for static assets and cacheable responses) -> application server. This reduces application server load for cacheable content without a third-party CDN.

A slow .onion service frustrates users and increases bounce rates. Profiling identifies where time is actually spent - often it is not where operators assume. This guide covers systematic performance profiling for Tor hidden service applications.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Understanding the Latency Stack

Every request to a .onion hidden service passes through multiple layers, each adding latency. Tor circuit establishment: 3-5 seconds for initial circuit build (on first visit). Subsequent requests on the same circuit: 200-500ms Tor network overhead. Hidden service rendezvous circuit: 6-hop circuit adds ~2x the latency of 3-hop Tor browsing. Application server processing: depends entirely on the application. Database queries, external API calls, and computation in the application layer add to total latency. Network within the server: application to Nginx (Unix socket: microseconds) and Nginx to Tor (local TCP: <1ms). The application layer (database, computation) is typically where optimization yields the most improvement for a well-configured server. Tor circuit latency is outside the operator's control (except by using well-connected, fast hosting infrastructure).

Application-Level Profiling Tools

Python applications (Django, Flask): cProfile (built-in Python profiler, use django-silk for request-level profiling in Django), py-spy (sampling profiler that attaches to running process without code changes), and Pyroscope (continuous profiling server). Node.js applications: built-in --inspect flag enables Chrome DevTools profiling, clinic.js for detailed Node.js performance analysis, and 0x for flame graph generation. PHP applications: Xdebug profiling, Blackfire.io (commercial profiler with excellent PHP support), and Tideways (PHP profiling service). Go applications: pprof built-in profiler, accessible via /debug/pprof HTTP endpoint. Focus on: slow functions (CPU), memory allocation hot spots (GC pressure), I/O blocking (database waits, external API calls), and thread contention (lock contention in multi-threaded applications).

Database Query Profiling

Database queries are the most common bottleneck in web applications. PostgreSQL slow query log: log_min_duration_statement = 100 (log queries taking >100ms) in postgresql.conf. EXPLAIN ANALYZE: run EXPLAIN ANALYZE on the slowest queries to see query execution plans and actual row counts. Missing indexes: queries scanning full tables (Seq Scan in EXPLAIN output) when they should use indexes are common bottlenecks. Add indexes on frequently-queried columns. N+1 query problem: applications that load a list (1 query) then load related data for each item (N queries) create N+1 database round trips. Use JOIN or ORM eager loading to collapse into 1-2 queries. Monitoring in production: pg_stat_statements PostgreSQL extension tracks query execution statistics (calls, total time, mean time) across all queries - run periodically to identify which queries accumulate the most time.

Nginx and Reverse Proxy Profiling

Nginx is typically not a bottleneck for hidden services (it handles thousands of requests per second at minimal CPU cost). However, Nginx configuration can cause unnecessary latency: buffer sizes (proxy_buffer_size and proxy_buffers too small causes more syscalls), upstream keepalive (keepalive in upstream block reuses connections to the application backend, avoiding TCP handshake per request), sendfile and tcp_nopush for static file serving, and gzip compression (for text responses, gzip reduces transfer size at the cost of CPU). Enable gzip: gzip on; gzip_types text/plain text/css application/json application/javascript; gzip_comp_level 6;. For hidden services, compression is valuable: Tor has limited bandwidth, and smaller responses reduce transfer time more than they increase server CPU time.

Systematic Performance Optimization Process

Optimization process: (1) Baseline measurement: establish current performance metrics (response time p50/p95/p99, requests per second, error rate) using load testing tools (locust, k6, or wrk targeting the .onion service via SOCKS5 proxy to simulate real Tor clients). (2) Profile: identify top 3 slowest components using profiling tools. (3) Fix: address the top bottleneck. (4) Measure again: verify the fix improved the metric and did not regress others. (5) Repeat. Common findings in order of frequency: slow database queries without indexes, N+1 ORM queries, missing Redis caching for expensive repeated computations, template rendering overhead for complex pages, and external API calls in the request path. Configuration checklist: application production mode (not debug), opcode caches enabled (PHP opcache, Python .pyc), static assets served by Nginx not by the application, and database connection pooling.

Related Services

Offshore VPS from $17.90/mo Dedicated Servers DevOps Services

Why Anubiz Host

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

Bulletproof Hosting Providers

DMCA-Ignored Servers

Offshore VPS Hosting

Anonymous Hosting Solutions

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Iceland VPS I