Intent Engine Technical Research Notes

These notes distill the current state of the Intent Engine as of the March 2026 documentation set, focusing on architecture, performance characteristics, language filtering, configuration, and future work.

1. System goals and problem statement

The Intent Engine is designed to turn ambiguous, free‑form user queries into structured intents and then execute those intents across multiple backends to deliver results that match what the user is actually trying to do.

Primary goals include:

Reducing end‑to‑end search latency from multi‑second to sub‑second ranges for cached queries while keeping uncached queries within a few seconds P95.
Building a self‑improving search loop where every user query seeds new URLs and topics into a crawler‑driven pipeline.
Providing observability, configurability, and deployment practices suitable for production environments.

2. High‑level architecture

The architecture is explicitly polyglot, with Python orchestrating APIs and business logic while Go handles crawling and indexing for performance.

Major components and their roles:

Intent Engine API (FastAPI, Python, port 8000):
- Hosts endpoints such as /health, /extract-intent, /search, /rank-urls, /rank-results, and seed‑discovery utilities.
- Implements intent extraction, federated search orchestration, ranking, and ad matching.
Unified Search API (Go, ports 8081/8082):
- Queries Go indexes, SearXNG, and Qdrant in parallel and aggregates results.
- Exposes a single interface to Python for high‑throughput search.
Go crawler and indexer:
- Maintains a prioritized Redis‑backed crawl queue, visits pages, extracts content and metadata, and builds intent‑aware indexes using Bleve or related libraries.
- Stores crawled content in PostgreSQL with language and intent annotations.
SearXNG (port 8080): Privacy‑focused meta‑search used as both a search backend and a URL discovery mechanism for the self‑improving loop.
Qdrant (port 6333): Vector database that stores embeddings created by the Python embedding service for semantic search and intent‑aligned retrieval.
Redis (port 6379): Dual role as a search response cache and as a sorted‑set‑based URL crawl queue with over one million URLs in realistic runs.
PostgreSQL (port 5432): Primary relational store for crawled pages, intent metadata, sessions, and ad‑related data.
Prometheus (port 9090) and Grafana (port 3000): Provide metrics and dashboards for latency, throughput, error rates, and crawler health.

A single Docker Compose configuration can bring up all services with health checks and dependencies wired together, allowing the system to be run locally, in CI, or in production‑like environments.

3. Self‑improving search loop

The self‑improving loop is central to v2 and later.

3.1 Loop stages

The documented loop proceeds as follows:

User query: A user issues a search such as "golang tutorial".
Intent extraction: The API extracts a structured intent object containing fields like goal (learn, comparison, troubleshooting, purchase, etc.), useCases, resultType, complexity, skillLevel, and temporal dimensions.
Meta‑search: SearXNG queries multiple external engines (Google, Brave, DuckDuckGo, Bing) and returns up to tens of thousands of candidate results.
URL selection: The system selects the top unique URLs (e.g., the top 30) and scores them for crawling priority.
Queueing: URLs are pushed into a Redis sorted set (crawlqueue) with priorities reflecting intent alignment and other heuristics; in production snapshots this queue holds around 1.28 million URLs.
Crawling: The Go crawler processes the queue under constraints such as maximum pages, depth, and concurrency, while applying language filters.
Indexing: The Go indexer enriches documents with intent metadata and indexes them, updating both full‑text and vector indexes.
Vector storage: Embeddings are generated by a Python service using a sentence‑transformer model and stored in Qdrant with cosine similarity search.
Improved retrieval: Subsequent searches over related topics benefit from the new content and refined intent signals.

In one documented run, three user searches were enough to inject roughly 634,000 URLs into the crawler, demonstrating the amplification effect of the self‑improving design.

3.2 Seed discovery and topic expansion

Beyond directly seeding URLs from live searches, a seed discovery subsystem expands topics based on observed queries.

Key mechanisms:

Redis data structures hold discovery topics, query history, and keyword frequency.
The system extracts keywords from user queries (removing stop words) and categorizes them via pattern matching (e.g., mapping "rust" and "golang" to a golanguage topic).
New topics are generated from trending keywords, such as "rust tutorial" or "rust best practices".
SearXNG is periodically queried for each topic, and discovered URLs are injected into the crawl queue with associated topics and priorities.

Configuration is controlled via environment variables like SEED_DISCOVERY_INTERVAL_HOURS, TOPIC_EXPANSION_INTERVAL_HOURS, and MAX_URLS_PER_RUN, with default runs scheduled every few hours.

4. Language filtering research and implementation

Language filtering for the Go crawler replaces brittle domain‑based blocking with per‑page detection.

4.1 Research findings

The implementation is guided by several documented findings:

HTML lang attributes should use BCP 47 tags (e.g., en, en-GB, zh-CN), and WCAG 3.1.1 requires that the primary language be specified.
The lingua-go library achieves around 74 percent accuracy on single short words, 89 percent on word pairs, and up to 96 percent on full sentences, with an overall mean of roughly 86 percent.
Reliable detection requires at least around 100 characters of content, with 500 characters considered optimal for confidence scoring.

4.2 Multi‑layer detection strategy

The production strategy uses three layers:

HTML lang attribute: Fast extraction of the declared language from the root html element.
Content analysis: lingua-go runs on the first 500 characters of text, using a reduced set of languages to optimize memory (around 500 MB instead of the full 1.8 GB high‑accuracy model).
Decision logic: Combines attribute and content signals, preferring valid lang attributes but allowing content to override missing or suspicious declarations. Pages whose final inferred language is not English are skipped and logged.

The design explicitly aims to:

Crawl English pages found on non‑English TLDs.
Skip non‑English content even on .com domains.
Avoid manual blocklists in favor of a library‑backed detector.

Trade‑offs include additional per‑page latency (about 500 ms for content analysis), substantial memory use for language models, reduced accuracy on very short content, and difficulty handling mixed‑language pages.

5. Search latency optimization

Initial deployments observed uncached P95 latencies between 7,000 and 17,000 ms and cold starts of around 108,000 ms.

5.1 Multi‑level caching

Two major cache layers were added:

L1 in‑process LRU cache: Stores intent extraction results keyed by normalized query, with a capacity around 2,000 entries and ~1 ms access times.
L2 Redis cache: Stores full unified search responses with a TTL of 1 hour and target hit rate of 60–80 percent. Cached responses are roughly 11× faster than recomputing the query end‑to‑end.

Query normalization (e.g., stripping punctuation, unifying word order for some patterns) increases effective cache coverage by mapping semantically equivalent queries to the same cache key.

5.2 Timeouts, ranking, and parallelism

Additional improvements include:

Per‑stage timeouts: Intent extraction and backend searches are wrapped with timeouts, with fallback paths that prefer partial but timely results over timeouts.
Top‑K ranking: ML‑based ranking is restricted to the top 40 candidates, leaving the rest in their original order and reducing ranking times by roughly 5×.
HTTP connection pooling and HTTP2: A shared httpx.AsyncClient with connection limits and HTTP2 support reduces per‑request overhead by tens of milliseconds.
Parallel backend queries: Go index search, SearXNG, and vector search are invoked concurrently instead of sequentially.

After these changes, documented targets are:

Cached queries: around 100 ms P95.
Uncached queries: around 3,000 ms P95.
Cold start: around 5,000 ms, primarily to warm models and caches.

6. Configuration and deployment practices

Configuration is centralized via environment variables and Pydantic settings, with historical changes recorded for database, cache, and CORS behavior.

Notable practices:

PostgreSQL:
- Uses connection pooling with configurable POOL_SIZE, MAX_OVERFLOW, POOL_TIMEOUT, and POOL_RECYCLE values (e.g., 10/20/30/1800 defaults).
- Encourages external managed PostgreSQL in production and proper backup strategies.
Redis:
- Exposed via environment variables for host, port, DB, connection limits, and timeouts.
- Recommended to run as an external, authenticated service with persistence enabled.
CORS:
- Dynamically configured via comma‑separated origin, method, and header lists, with an ENABLE_CORS toggle and logging of the active configuration.
Rate limiting:
- Optional SlowAPI integration backed by Redis supports per‑IP request limits.

Deployment guidance includes production Docker Compose snippets, health checks, log aggregation, SSL/TLS recommendations, and a detailed checklist for moving from development to production.

7. Monitoring, metrics, and dashboards

Prometheus scrapes metrics from Python, Go, and infrastructure components, while Grafana uses provisioned dashboards to present a unified view of system behavior.

Key metrics include:

Request counts and latency histograms for unified search and intent extraction.
Cache hit/miss counters for Redis.
Crawler activity logs such as visiting URLs, skipping non‑English pages, and crawl queue size.
Database connection usage, memory, and error rates.

Dashboards are provisioned via YAML and JSON files so that a fresh deployment automatically gains curated visualizations for latency, errors, cache behavior, and crawler health.

8. Limitations and open questions

Despite substantial progress, the documentation highlights several limitations:

Language detection adds latency and memory overhead and is less reliable on very short or mixed‑language content.
Cache efficiency depends heavily on query normalization quality and traffic patterns.
Index quality and coverage are currently limited by crawl depth, concurrency, and how aggressively seed discovery expands topics.
Embedding quality is bounded by the chosen sentence‑transformer model and lack of domain‑specific fine‑tuning.

Open research directions include:

ML‑based topic categorization and automatic category merging.
Multi‑language support beyond English.
Incorporating user feedback and behavioral signals into ranking and topic expansion.
More sophisticated policies for crawl prioritization and de‑duplication at scale.

9. Future work and roadmap hints

The March 2026 notes and changelogs outline a roadmap that emphasizes stability, performance, and richer self‑improvement.

Highlighted next steps:

Let crawlers run for extended periods to fully process large queues.
Enable higher‑quality embedding models as hardware budgets allow.
Scale Go crawler instances horizontally and tune Redis queue parameters.
Tighten integration between monitoring and operations, including alerting on latency, error rates, and crawl backlog.

For engineers and researchers, the Intent Engine serves as a concrete case study in how to unify intent understanding, federated search, web crawling, and vector retrieval into a single self‑improving system that can be deployed and observed in realistic environments.

Abstract