A question goes in. Cited, themed analysis comes out. Here is what happens in between.
Scrapers pull customer feedback from public sources. No API keys, no permissions needed. Each review, post, and comment becomes a searchable document.
Trustpilot reviews are extracted from __NEXT_DATA__ JSON embedded in server-rendered pages. Reddit posts and comments come via the old.reddit.com JSON API. Each document gets a full-text search index (tsvector) automatically on insert, weighting titles higher than body text.
Huel demo: 200 Trustpilot reviews + 1,016 Reddit posts and comments = 1,216 documents
A product manager types a plain English question. "What do customers say about the taste?"
The question hits the /api/ask endpoint. No preprocessing, no keyword extraction on the client side. The raw question goes straight to the agent brain.
Claude Haiku breaks the question into 3-5 Postgres search queries, adding synonyms and related terms the user would never think to search for.
Haiku receives the original question and generates websearch-compatible queries. It knows to add synonyms ("taste" becomes taste OR flavor OR texture OR disgusting OR delicious), filter by source when relevant, and use phrase matching for specific concepts.
"What do customers say about taste?" becomes: taste OR flavor OR texture "tastes good" OR "tastes bad" OR disgusting flavor comparison OR "better than" chocolate OR vanilla OR berry OR banana
Each expanded query hits the Postgres full-text search index. Results are deduplicated and ranked by relevance. Top 40 documents pass through.
Supabase runs websearch_to_tsquery against the tsvector index on every document. Title matches rank higher than body matches. Results from all queries are merged, deduplicated by document ID, and sorted by ts_rank_cd score. The top 40 go to synthesis.
Claude Sonnet reads all 40 documents, identifies themes, pulls direct quotes, and writes a structured answer with numbered citations linking back to original sources.
Sonnet receives the original question plus all retrieved evidence with metadata (source type, rating, author, date, URL). It groups findings into themes, surfaces patterns, shows both sides when views are mixed, and never invents. Every claim is backed by a numbered citation.
The output includes: Themed sections with headers Direct customer quotes Source citations [1], [2], [3] Clickable links to original reviews/posts Analysis duration timestamp
The cited answer appears in the chat. Every source is expandable with links to the original review or post. The query is logged for analytics.
The full pipeline replaces what would take a research team 8 days of manual review reading. The answer is as good as the evidence. More documents means better answers.