Why track ChatGPT Search at all?
ChatGPT Search is now the second-largest AI search surface (after Google AI Overviews). When a brand gets cited as a source in a ChatGPT Search response, that's the new top-of-funnel — users see the brand inside the AI answer and click through. Marketing teams that tracked Google SERPs religiously now need to track ChatGPT Search the same way.
The 2026 landscape complication: OpenAI launched ChatGPT Atlas in late 2025 — their own agent-capable browser available on macOS (Windows / iOS / Android coming). Atlas users get an "agent mode" that lets ChatGPT directly browse and complete tasks. This means more queries route through Atlas's in-browser search rather than chatgpt.com/search, fragmenting the visibility surface. For tracking purposes you still target chatgpt.com/search (the largest surface), but be aware Atlas-specific behaviors are emerging.
Setup
pip install playwright httpx
playwright install chromium
export PSX_USERNAME="psx_YOUR_ID-mbl-us-rot-auto10"
export PSX_PASSWORD="YOUR_PROXY_PASSWORD"Basic ChatGPT Search scraper
import asyncio
import os
from urllib.parse import quote_plus
from playwright.async_api import async_playwright
async def scrape_chatgpt_search(query: str) -> dict:
url = f"https://chatgpt.com/search?q={quote_plus(query)}"
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy={
"server": "http://gw.proxies.sx:7000",
"username": os.environ['PSX_USERNAME'],
"password": os.environ['PSX_PASSWORD'],
},
)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
)
page = await context.new_page()
await page.goto(url, wait_until="networkidle", timeout=45000)
# Wait for streaming response to complete (response container fully loaded)
await page.wait_for_selector('[data-testid="conversation-turn-2"]', timeout=30000)
await page.wait_for_timeout(3500) # wait for citation rail to populate
result = await page.evaluate("""
() => {
// Main answer text
const answerEl = document.querySelector('[data-message-author-role="assistant"]');
const answer = answerEl ? answerEl.innerText : null;
// Citation cards (right-side rail in ChatGPT Search)
const citations = [...document.querySelectorAll('a[href*="://"][target="_blank"]')]
.filter(a => a.closest('[class*="citation"], [class*="source"]'))
.map(a => ({
url: a.href,
title: a.textContent.trim().slice(0, 200),
domain: new URL(a.href).hostname,
}));
return { answer: answer ? answer.slice(0, 6000) : null, citations };
}
""")
await browser.close()
return { "query": query, **result }
if __name__ == "__main__":
print(asyncio.run(scrape_chatgpt_search("best mobile proxy provider 2026")))Parsing the response
ChatGPT Search rotates DOM structure ~monthly. Robust selector strategy:
- →Use data-message-author-role="assistant" for the answer body — stable across UI revisions
- →Filter a[target="_blank"] for citations; class names rotate but the target attribute holds
- →The right-side citation rail is conditionally rendered — wait for it explicitly with page.wait_for_selector
Cloudflare + OpenAI bot detection
OpenAI fronts chatgpt.com with Cloudflare. Cloudflare in 2026 fingerprints TLS (JA4), profiles browsing behavior, and aggressively blocks datacenter ranges. Practical defenses:
- →Mobile carrier IPs only. Datacenter IPs hit a Cloudflare challenge within 1-3 queries. Real mobile IPs survive 50-100+. Why this is true.
- →Realistic user agent. Default Playwright UA is flagged. Use a current Chrome 131+ UA matching macOS or Windows.
- →IP rotation in username. Use -rot-auto10 in your proxy username to get a fresh IP every 10 minutes. Build the right string with the Username Builder.
- →Random delays between queries. 6-15 seconds with jitter. Faster patterns are clearly bot.
Use cases
- →Brand monitoring — daily check whether ChatGPT cites your domain for relevant queries
- →Competitive research — see which competitors get cited and for what topics
- →Content gap analysis — query topics where you should rank but don't
- →SEO + GEO (Generative Engine Optimization) measurement