Does OpenAI offer a search API?

OpenAI exposes web search to API customers via tool calls, but the search results are not directly returned in a queryable format — they're consumed by the model. To track what ChatGPT Search shows users you need browser-rendered scraping.

Do I need a ChatGPT account to scrape ChatGPT Search?

For logged-out search you can hit chatgpt.com/search anonymously. For longer queries or follow-ups you may need an authenticated session — manage that with cookies in Playwright.

Why mobile proxies and not residential?

OpenAI uses Cloudflare which blocks datacenter and many residential proxy ranges aggressively. Real mobile carrier IPs (CGNAT) survive these checks because they belong to real consumer subscribers.

Scrape ChatGPT Search Results in 2026 (Python + Mobile Proxies)

why this matters

Why track ChatGPT Search at all?

ChatGPT Search is now the second-largest AI search surface (after Google AI Overviews). When a brand gets cited as a source in a ChatGPT Search response, that's the new top-of-funnel — users see the brand inside the AI answer and click through. Marketing teams that tracked Google SERPs religiously now need to track ChatGPT Search the same way.

The 2026 landscape complication: OpenAI launched ChatGPT Atlas in late 2025 — their own agent-capable browser available on macOS (Windows / iOS / Android coming). Atlas users get an "agent mode" that lets ChatGPT directly browse and complete tasks. This means more queries route through Atlas's in-browser search rather than chatgpt.com/search, fragmenting the visibility surface. For tracking purposes you still target chatgpt.com/search (the largest surface), but be aware Atlas-specific behaviors are emerging.

step 1

Setup

bash

pip install playwright httpx
playwright install chromium

export PSX_USERNAME="psx_YOUR_ID-mbl-us-rot-auto10"
export PSX_PASSWORD="YOUR_PROXY_PASSWORD"

step 2

Basic ChatGPT Search scraper

python

import asyncio
import os
from urllib.parse import quote_plus
from playwright.async_api import async_playwright

async def scrape_chatgpt_search(query: str) -> dict:
    url = f"https://chatgpt.com/search?q={quote_plus(query)}"

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy={
                "server": "http://gw.proxies.sx:7000",
                "username": os.environ['PSX_USERNAME'],
                "password": os.environ['PSX_PASSWORD'],
            },
        )
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
                       "(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
        )
        page = await context.new_page()
        await page.goto(url, wait_until="networkidle", timeout=45000)

        # Wait for streaming response to complete (response container fully loaded)
        await page.wait_for_selector('[data-testid="conversation-turn-2"]', timeout=30000)
        await page.wait_for_timeout(3500)  # wait for citation rail to populate

        result = await page.evaluate("""
() => {
  // Main answer text
  const answerEl = document.querySelector('[data-message-author-role="assistant"]');
  const answer = answerEl ? answerEl.innerText : null;

  // Citation cards (right-side rail in ChatGPT Search)
  const citations = [...document.querySelectorAll('a[href*="://"][target="_blank"]')]
    .filter(a => a.closest('[class*="citation"], [class*="source"]'))
    .map(a => ({
      url: a.href,
      title: a.textContent.trim().slice(0, 200),
      domain: new URL(a.href).hostname,
    }));

  return { answer: answer ? answer.slice(0, 6000) : null, citations };
}
        """)

        await browser.close()
        return { "query": query, **result }

if __name__ == "__main__":
    print(asyncio.run(scrape_chatgpt_search("best mobile proxy provider 2026")))

step 3

Parsing the response

ChatGPT Search rotates DOM structure ~monthly. Robust selector strategy:

→Use data-message-author-role="assistant" for the answer body — stable across UI revisions
→Filter a[target="_blank"] for citations; class names rotate but the target attribute holds
→The right-side citation rail is conditionally rendered — wait for it explicitly with page.wait_for_selector

anti-bot

Cloudflare + OpenAI bot detection

OpenAI fronts chatgpt.com with Cloudflare. Cloudflare in 2026 fingerprints TLS (JA4), profiles browsing behavior, and aggressively blocks datacenter ranges. Practical defenses:

→Mobile carrier IPs only. Datacenter IPs hit a Cloudflare challenge within 1-3 queries. Real mobile IPs survive 50-100+. Why this is true.
→Realistic user agent. Default Playwright UA is flagged. Use a current Chrome 131+ UA matching macOS or Windows.
→IP rotation in username. Use -rot-auto10 in your proxy username to get a fresh IP every 10 minutes. Build the right string with the Username Builder.
→Random delays between queries. 6-15 seconds with jitter. Faster patterns are clearly bot.

why this matters

Use cases

→Brand monitoring — daily check whether ChatGPT cites your domain for relevant queries
→Competitive research — see which competitors get cited and for what topics
→Content gap analysis — query topics where you should rank but don't
→SEO + GEO (Generative Engine Optimization) measurement

FAQ

Will OpenAI sue me for scraping?▾

Public web data is generally protected scraping territory under US precedent (Van Buren, hiQ v LinkedIn). OpenAI's ToS prohibits automated access — that's contract law, not criminal. Don't scrape behind login, don't republish verbatim, attribute, and you're in normal scraping territory.

Can I use this with the OpenAI API instead?▾

The OpenAI API exposes web search to LLM tools but doesn't return search results as queryable JSON — they get consumed by the model. To track what users see in ChatGPT Search, you have to scrape the user-facing UI.

How many queries per day can I run?▾

With proper IP rotation (mobile carrier IPs, -rot-auto10 in the username, 6-15s delays), 200-500 queries per day per account is sustainable. For more, run multiple Playwright workers with different proxy strings.

What about Bing Copilot or Perplexity?▾

Same pattern, different DOM. Bing Copilot is easier (less aggressive bot detection). Perplexity has its own quirks — see /blog/perplexity-api-alternatives-self-hosted-2026 for that one.

Does ChatGPT Atlas change the scraping picture?▾

Atlas is OpenAI's own browser, launched late 2025 on macOS. It has an agent mode that lets ChatGPT browse on the user's behalf. For tracking purposes, you still scrape chatgpt.com/search — that's where the bulk of brand-visibility queries land. Atlas-specific scraping is its own topic; we'll cover it as adoption grows.

Does Bright Data have a ChatGPT Scraper API?▾

Yes — they ship a managed ChatGPT Scraper API that returns structured search results, citations, and answers. Useful if you want to skip the Playwright plumbing entirely. Tradeoff: less control, per-query pricing instead of per-GB. For high-volume tracking, the DIY mobile-proxy approach is cheaper.

Get mobile proxies, run the scraper, see what ChatGPT cites for your money queries.

› Pool Gateway Username Builder Google AI Overviews

Scrape Google AI Overviews

Same pattern, Google's SGE blocks

Perplexity API Alternatives

Build your own AI search backend

Why Cloudflare Blocks Residential

Why mobile is the only path

Bypass Cloudflare in 2026

Anti-bot strategy reference