BlogAI Search Scraping·13 min read

Perplexity API alternatives — build your own AI search backend.

The Perplexity API is convenient and expensive. The actual workload — search the web, fetch top sources, summarize with an LLM — is buildable with off-the-shelf components in 200 lines. Mobile proxies for search, Claude/GPT for the LLM step. Cheaper, more flexible, no vendor lock-in.

tl;dr

Perplexity = SERP scrape + content fetch + LLM summarize. Replace each step: Pool Gateway mobile proxies → Google SERP → top N URLs → fetch each via proxies → Claude/GPT summarizes. Cost runs roughly $0.005-0.02 per query depending on LLM model and average source-page size. More flexible than the Perplexity API: you control which sources, which model, which language.

april 2026

Perplexity Sonar API pricing — current state

Honest framing first: Perplexity has expanded substantially since launch. As of April 2026 they ship four Sonar SKUs plus separate Reasoning, Deep Research, Search, and Agentic Research APIs.

ModelInput · per 1M tokensOutput · per 1M tokensBest for
Sonar Small Online$0.20$0.20High-volume cheap queries
Sonar Large Online$1.00$1.00Balanced quality/cost
Sonar Huge Online$5.00$5.00High-quality answers
Sonar Pro$3.00$15.00Flagship · best reasoning

Plus per-request fees for search context (varies by depth) on Sonar / Sonar Pro / Sonar Reasoning Pro. Good news for 2026: citation tokens are no longer billed for standard Sonar / Sonar Pro (they used to count against your token budget — meaningful saving for citation-heavy workloads).

Beyond the Sonar models, Perplexity now exposes:

  • Search API: raw web results without LLM synthesis at $5 per 1,000 requests flat. Good if you want to plug Perplexity's search into your own LLM pipeline.
  • Agentic Research API: third-party model access (OpenAI, Anthropic, Google, xAI) at provider direct rates with web-search tool calls billed at $0.005/invocation and URL fetches at $0.0005/invocation.
  • Sonar Reasoning Pro: for chain-of-thought research workloads.
  • Sonar Deep Research: longer-horizon multi-step research tasks.
motivation

Why self-host anyway?

With pricing this concrete, when does building your own pipeline still make sense? Three structural reasons:

  • Volume math. Sonar Small at $0.40 per 1M tokens (input + output combined) is already cheap. But for 100K+ queries/month with reasoning models or large search context, costs add up. A custom pipeline (Pool Gateway proxies + Claude Sonnet/Haiku for the LLM step) can land in the $0.005-0.025 range per query.
  • Model choice. Sonar wraps Perplexity's opinionated stack. The Agentic Research API gives you third-party model access but with their search-tool fees layered on. Self-hosting lets you swap LLM providers per query type freely.
  • Source control. Sonar decides which sources to consider. For domain-specific research (legal, medical, technical), you may want to constrain search to particular sites or sources — easy with a custom scraper, harder via Sonar.
  • Vendor independence. Your prompts, fetch logic, and evaluation harness stay yours. Perplexity changes pricing or model behavior, you're not locked in.
how it works

Architecture

text
User query
  │
  ▼
┌─────────────────────────────────────────┐
│ 1. Search step                          │
│    Scrape Google / Bing / etc. for      │
│    top N URLs related to the query      │
│    (via Pool Gateway mobile proxies)    │
└─────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────┐
│ 2. Content fetch                        │
│    For each URL, fetch the page         │
│    (also via mobile proxies — many      │
│    sites block datacenter scrapers)     │
│    Extract main content with Readability│
└─────────────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────────────┐
│ 3. LLM summarize                        │
│    Send query + sources to Claude/GPT   │
│    Get back: summary + per-citation     │
│    inline references                    │
└─────────────────────────────────────────┘
  │
  ▼
Final response (matches Perplexity output shape)
step by step

Implementation

bash
pip install httpx playwright readability-lxml anthropic
playwright install chromium

Step 1: search via Google SERP scraper

python
import asyncio, os
from urllib.parse import quote_plus
from playwright.async_api import async_playwright

PSX_USER = os.environ['PSX_USERNAME']
PSX_PASS = os.environ['PSX_PASSWORD']

async def search_google(query: str, top_n: int = 5) -> list[str]:
    """Get top N URLs for a query."""
    url = f"https://www.google.com/search?q={quote_plus(query)}&hl=en&num=20"
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy={"server": "http://gw.proxies.sx:7000", "username": PSX_USER, "password": PSX_PASS},
        )
        page = await (await browser.new_context()).new_page()
        await page.goto(url, wait_until="networkidle", timeout=30000)
        await page.wait_for_timeout(1500)

        urls = await page.evaluate("""
() => [...document.querySelectorAll('h3')]
  .map(h => h.closest('a'))
  .filter(a => a && a.href.startsWith('http'))
  .map(a => a.href)
  .filter(u => !u.includes('google.com'))
        """)
        await browser.close()
        return list(dict.fromkeys(urls))[:top_n]  # dedup, slice

Step 2: fetch and clean content

python
import httpx
from readability import Document

async def fetch_clean(url: str) -> dict:
    """Fetch URL through mobile proxy, extract main content."""
    proxy_url = f"http://{PSX_USER}:{PSX_PASS}@gw.proxies.sx:7000"
    async with httpx.AsyncClient(proxies=proxy_url, timeout=30, follow_redirects=True) as client:
        try:
            r = await client.get(url, headers={"User-Agent": "Mozilla/5.0 ..."})
            doc = Document(r.text)
            return {
                "url": url,
                "title": doc.title(),
                "content": doc.summary(html_partial=True)[:8000],  # cap context
            }
        except Exception as e:
            return {"url": url, "error": str(e)}

Step 3: summarize with Claude

python
from anthropic import Anthropic

anthropic = Anthropic()

async def ai_search(query: str, top_n: int = 5) -> dict:
    urls = await search_google(query, top_n)
    sources = await asyncio.gather(*[fetch_clean(u) for u in urls])
    sources = [s for s in sources if "content" in s]

    # Build a prompt with numbered citations
    citations_block = "\n\n".join(
        f"[{i+1}] {s['title']}\nURL: {s['url']}\n{s['content']}"
        for i, s in enumerate(sources)
    )

    msg = anthropic.messages.create(
        model="claude-opus-4-7",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""Answer the user's question using the provided sources. Cite sources inline as [1], [2], etc.

QUESTION: {query}

SOURCES:
{citations_block}

ANSWER (with inline citations):"""
        }]
    )
    return {
        "query": query,
        "answer": msg.content[0].text,
        "sources": [{"index": i+1, "url": s["url"], "title": s["title"]} for i, s in enumerate(sources)],
    }

# Use it
result = asyncio.run(ai_search("how does CGNAT work"))
print(result)
economics

Cost comparison vs Sonar Pro

Approximate per-query cost for a typical research query (5 source pages, ~5K tokens of context, ~500 tokens out):

StackPer-query costNotes
Sonar Small Online~$0.0022Cheapest Sonar SKU · weakest answers
Sonar Large Online~$0.011Decent quality
Sonar Pro~$0.022Plus per-request fee for search context
DIY (Pool Gateway + Haiku)~$0.003-0.008Mobile proxies + Anthropic Claude Haiku
DIY (Pool Gateway + Sonnet)~$0.012-0.025Mobile proxies + Claude Sonnet

DIY math: ~5MB of mobile-proxy traffic per query (SERP scrape + 5 fetches) at shared-tier per-GB pricing is <1¢. The LLM call dominates the cost. Pick Haiku for high-volume cost-sensitive workloads, Sonnet for balanced quality.

The verdict: Sonar Small is cheap but quality-limited. Sonar Pro's pricing is close to a DIY-Sonnet stack while being more constrained. For prototypes, Sonar wins on speed-of-setup. For production at moderate-to-high volume, DIY wins on cost + flexibility.

be honest

Tradeoffs

Self-hosting is great when you need flexibility and cost control. It's worse when you need:

  • Latency < 2 seconds. Perplexity has aggressive caching + indexing. Custom scrape-then-summarize is typically 5-15 seconds end-to-end depending on source count and LLM.
  • Zero-ops onboarding. You maintain the scraper, handle SERP changes, monitor proxy health. Perplexity does this for you.
  • Realtime news. Perplexity has freshness signals tuned. Your custom pipeline depends on what Google indexes.

For most B2B AI search use cases — research dashboards, internal knowledge tools, monitoring agents — self-hosting wins on cost and flexibility. For consumer-facing latency-sensitive apps, stick with the commercial API.

FAQ

Why mobile proxies for the search step?
Google heavily blocks datacenter IPs. A SERP scraper using AWS or DigitalOcean IPs hits CAPTCHAs within 5-10 queries. Real mobile carrier IPs survive 50-200+. The cost difference is small relative to total query cost.
Can I use this commercially?
Public web data scraping for analysis is generally legal under US precedent (Van Buren, hiQ v LinkedIn). Don't scrape behind login walls, attribute sources, respect robots.txt where it makes sense. The same rules apply to your scraper as apply to Perplexity's scraper.
What about Bing instead of Google?
Same architecture, different scraper. Bing is somewhat easier to scrape (less aggressive anti-bot). Quality is comparable for most queries. Run both and merge for best coverage.
Does this work with GPT-4 or Gemini instead of Claude?
Yes, swap the LLM call. The architecture is model-agnostic. Cost varies by provider — at moderate input sizes Anthropic, OpenAI, and Google price competitively for the relevant models.
How do I scale this past hundreds of queries per second?
Run multiple Playwright workers in parallel, distribute across proxy strings (different country/carrier in the username DSL). Cache aggressively — many queries are repeats. Use httpx connection pooling, not new clients per query.
Get mobile proxies + start replacing Perplexity API.