Is building a .onion search engine legal?

Web crawling and indexing is generally legal (courts in US and EU have consistently held this). The legal questions arise from content - indexing CSAM or facilitating access to illegal services creates liability. Maintain strict content filtering, a robust removal process, and document your filtering methodology to demonstrate good faith compliance.

How do I get a comprehensive initial seed list of .onion URLs?

Ahmia publishes a subset of its indexed URLs. dark.fail maintains a curated list of verified .onion services. The Tor Project's exit relay traffic analysis (in aggregate, privacy-preserving form) reveals frequently-visited .onion domains. Community submissions (allow users to submit .onion URLs for indexing) are another source. Each source has different quality characteristics - Ahmia's list is filtered, community submissions need moderation.

What languages do .onion search users typically search in?

Based on Tor usage statistics, major languages include Russian, English, Arabic, Farsi, Turkish, and Chinese. Building multilingual search (language-aware text analysis in Elasticsearch, language detection for indexed pages) significantly improves search quality for non-English users.

How does a .onion search engine handle onion addresses that change?

v3 .onion addresses (56 characters, cryptographically derived) are permanent for a given service. When a service moves to a new server with a new key, its .onion address changes. The search engine will eventually detect the old address is unreachable and decrement its freshness score. The new address must be discovered through crawling. There is no forwarding mechanism for .onion address changes.

What is the typical performance of a .onion search engine query?

Elasticsearch query response time is typically 10-100ms. The Tor circuit for the user's browser adds 300ms-2s. Total perceived query time: 500ms-3s, which is acceptable for search. For suggestion/autocomplete features, the Tor circuit latency makes real-time suggestions challenging - implement client-side suggestions from a pre-downloaded suggestion index rather than server-side live queries.

Building a .onion Search Engine: Indexing Dark Web Content

The dark web lacks the comprehensive search infrastructure that makes the clearnet navigable. Existing .onion search engines (Ahmia, Torch, Not Evil) index a fraction of available .onion content and often have significant downtime. Building a custom .onion search engine - whether for a specific topic vertical, a specific community, or as a general dark web search tool - requires solving unique challenges: crawling .onion addresses requires Tor routing, index infrastructure must run on .onion-accessible servers, and content filtering must prevent the search engine from indexing and surfacing illegal material. This guide covers the technical architecture for a .onion search engine using Python-based crawlers, Elasticsearch for indexing, and a Flask or FastAPI search interface served through a Tor hidden service.

Need this done for your project?

We implement, you ship. Async, documented, done in days.

Start a Brief

Crawler Architecture for .onion Content

A .onion crawler routes all HTTP requests through Tor's SOCKS5 proxy (127.0.0.1:9050). Python with Scrapy and the scrapy-socks-proxy middleware provides an efficient crawler framework. Crawler design: start with a seed list of known .onion URLs (Ahmia's public database, dark.fail listings, and manually curated starting points). For each URL, extract all .onion links from the page (use regex to find [a-z2-7]{56}\.onion URLs). Add discovered .onion URLs to the crawl queue. Use Bloom filter or Redis set for visited URL deduplication to avoid re-crawling. Politeness: rotate Tor circuits every 10-50 requests to distribute load across exit/guard relays. Add 2-5 second delays between requests to avoid overwhelming small .onion services. Implement per-domain rate limiting: no more than 1 request per second to any single .onion domain.

Elasticsearch Index Design for Dark Web Content

Elasticsearch provides full-text search with relevance ranking suitable for .onion content. Index schema: document fields include url (keyword, exact match), title (text, analyzed for search), description (text from meta description or first paragraph), content (text, full page text), onion_address (keyword), indexed_at (date), language (keyword for language-based filtering), and category (keyword). Create the index with appropriate analyzers: custom analyzer for .onion URL extraction, language-aware text analysis for multi-language content (dark web content is in many languages). Shard configuration: for a single-node deployment, 1-5 primary shards is sufficient. Elasticsearch should run on localhost only: do not expose Elasticsearch's REST API to external networks. Expose a search API (Flask/FastAPI) that queries Elasticsearch and returns results, with the search API itself served through a .onion hidden service.

Content Filtering and CSAM Prevention

A .onion search engine that indexes illegal content creates serious legal and ethical problems. Multi-layer content filtering: (1) URL blocklist: maintain a list of known illegal .onion addresses and skip them during crawling. Sources: NCMEC reports, law enforcement bulletins (where publicly shared), and community reports. (2) Content hash matching: before indexing, compare file hashes of any downloaded images against PhotoDNA hash databases. (3) Text classification: train or use a text classifier to identify content categories. Exclude from indexing: content matching CSAM indicators, content facilitating real-world violence, and content with explicit illegal service offerings. (4) Human review queue: content flagged by classifiers with confidence below a threshold queues for manual review before indexing. (5) Reporting mechanism: users of the search engine can report indexed content for review and removal.

Search UI Served Through a .onion Hidden Service

The search interface is a web application served via a Tor hidden service. Flask implementation: from flask import Flask, request, jsonify, render_template. Create /search endpoint accepting q (query) and page (pagination) parameters. Query Elasticsearch: result = es.search(index='onion_content', body={query: {multi_match: {query: q, fields: ['title^3', 'description^2', 'content']}}}). Render results with pagination. Serve via Nginx proxying to Flask (listen 127.0.0.1:5000), expose via Tor HiddenServicePort 80 127.0.0.1:80. UI features: basic search box, results with title, URL snippet, and cached summary. Important: do not cache or serve full pages through the search engine (only excerpts), to avoid serving illegal content through the search UI even if it appears in the index.

Operational Challenges and Maintenance

Running a public .onion search engine involves significant ongoing maintenance. .onion addresses are volatile: services go down frequently. Build a freshness score that decrements for each failed crawl attempt and removes indexes for addresses unreachable for 30+ days. Crawl rate management: the Tor network has bandwidth constraints. An aggressive crawler contributes negatively to Tor performance. Configure the crawler to run at moderate rates (a few hundred pages per hour maximum) and schedule bulk crawls during off-peak hours. Storage: a moderately sized dark web index (100,000 pages) requires 10-50 GB of Elasticsearch storage. Plan index growth and implement retention policies deleting pages unreachable for extended periods. Community governance: a public dark web search engine will face content removal requests, abuse, and manipulation attempts. Establish clear policies and moderation processes before public launch.

İlgili Hizmetler

Offshore VPS from $19.99/mo Offshore VPS Locations Global VPS from $29.99/mo Adanmış Sunucular Compare Plans by Jurisdiction DevOps Services

Privacy & anti-censorship guides

Tor in Russia 2026 Tor obfs4 Bridges Guide

Why Anubiz Host

100% async — no calls, no meetings

Delivered in days, not weeks

Full documentation included

Production-grade from day one

Security-first approach

Post-delivery support included

İlgili Makaleler

İlgili

Bulletproof Hosting Providers

İlgili

DMCA-Ignored Servers

İlgili

Offshore VPS from $19.99/mo

İlgili

Anonymous Hosting Solutions

İlgili

Tor in Russia 2026: Working Bridges

İlgili

Tor obfs4 Bridges Guide

Ready to get started?

Skip the research. Tell us what you need, and we'll scope it, implement it, and hand it back — fully documented and production-ready.

Start a Brief Iceland VPS III

Building a .onion Search Engine: Indexing Dark Web Content

Crawler Architecture for .onion Content

Elasticsearch Index Design for Dark Web Content

Content Filtering and CSAM Prevention

Search UI Served Through a .onion Hidden Service

Operational Challenges and Maintenance

İlgili Hizmetler

Privacy & anti-censorship guides

Why Anubiz Host

İlgili Makaleler

Ready to get started?

Anubiz Chat AI