Error Handling and Resilience for Tor Hidden Service Production Deployments
Production Tor hidden services face failure modes that do not exist for clearnet services: Tor circuit establishment failures, introduction point expiration, rendezvous point timeouts, and the Tor daemon itself crashing or losing connectivity. Applications built on top of Tor hidden services must handle these failures gracefully - retrying operations with appropriate backoff, distinguishing Tor-layer failures from application-layer errors, and providing useful feedback to users when the service is temporarily unreachable. This guide covers implementing resilient hidden service deployments and writing application code that handles Tor-specific failure modes correctly.
Need this done for your project?
We implement, you ship. Async, documented, done in days.
Hidden services experience failure modes beyond standard web application failures. Tor-specific failures: (1) introduction point failure - when a Tor relay used as an introduction point goes offline, the hidden service must establish a new introduction point. Tor handles this automatically but there is a window (seconds to minutes) where new connections may fail while the new introduction point is being established. (2) rendezvous point failure - during circuit establishment, if the chosen rendezvous point becomes unavailable, the circuit fails and client must retry. (3) descriptor not found - if the service descriptor has not yet been published to HSDirs (after restart) or if the responsible HSDirs are temporarily overloaded, clients get a circuit error that looks like the service is down but is actually a DNS-equivalent failure. (4) Tor daemon crash - if the Tor daemon itself crashes, the .onion becomes completely unreachable until it restarts and re-publishes its descriptor.
Application-Layer Retry Logic for Tor Clients
Applications connecting to .onion services should implement Tor-aware retry logic. Standard HTTP retry policies do not account for Tor circuit establishment time. Recommended client retry pattern: first attempt - standard timeout (30s for connection establishment); if failed: wait 5s, change Tor circuit (via control port: SIGNAL NEWNYM), retry with fresh circuit. Second attempt fail: wait 30s, retry (giving Tor time to rebuild introduction points). Third attempt fail: report service unreachable to user. Implement circuit changing via the Tor control port: connect to 127.0.0.1:9051, send AUTHENTICATE password, send SIGNAL NEWNYM. This forces a new circuit for the next connection attempt, helping bypass congested or failed rendezvous points. Avoid overly aggressive retry (every second) as this floods the Tor network with circuit establishment attempts.
Systemd Service Configuration for Automatic Recovery
Configure Tor daemon as a systemd service with automatic restart: in /etc/systemd/system/tor.service.d/override.conf: [Service] Restart=always, RestartSec=30, StartLimitInterval=300, StartLimitBurst=5. This configuration: restarts Tor after any failure with a 30-second delay, limits restarts to 5 within a 5-minute window (preventing restart storm). After Tor restarts, the hidden service automatically re-publishes its descriptor - this takes 30-120 seconds. Clients experience an outage of 1-3 minutes after Tor restart. Configure the application server (Nginx, application process) with similar systemd restart configuration. Add health checks: ExecStartPost=/bin/sleep 30 && curl -s http://127.0.0.1:80/ || exit 1 verifies the application is healthy before Tor publishes the descriptor to the network.
OnionBalance for High-Availability .onion Services
OnionBalance distributes a single .onion identity across multiple backend servers, providing load balancing and high availability. The OnionBalance manager holds the .onion private key and publishes descriptors that aggregate introduction points from multiple backend instances. If one backend instance fails, the remaining introduction points in the descriptor continue to work. Deployment: run OnionBalance on a separate management server, configure multiple backend servers with their own Tor instances, register each backend's introduction points with OnionBalance. The single .onion address serves traffic from multiple backends - clients cannot tell which backend they connect to. For maximum availability: backends in different availability zones or data centers, monitoring that detects when a backend's introduction points expire, and automatic descriptor refresh when backends are added or removed.
Graceful Degradation and Error Communication
When a .onion service is partially available (backend down but Tor layer up), communicate errors gracefully. Implement an error page served by Nginx when the backend application is unavailable: error_page 502 503 /maintenance.html; location /maintenance.html { root /var/www/error-pages; internal; }. The maintenance page should explain: 'Service temporarily unavailable. This .onion service is experiencing issues. Please try again in a few minutes.' Avoid revealing technical details (specific error codes, internal server names) in error pages. For API services, return structured errors: { 'error': 'service_unavailable', 'retry_after': 60 } with HTTP 503 and Retry-After header. Implement circuit breaker pattern in the application: after N consecutive failures, stop attempting backend connections for T seconds, serve cached or error responses, and re-check backend availability after T seconds.