Developer Tutorials
16 min read

Python Web Scraping with Mobile Proxies: Playwright, Requests, Scrapy Guide

Production-ready code for Python's top three scraping frameworks. Integrate real 4G/5G mobile proxies with rotation strategies, exponential backoff, and anti-detection patterns that actually work at scale.

3
Frameworks Covered
92%
Avg Success Rate
12+
Code Examples
1M+
Req/Day Scale

Why Mobile Proxies for Python Scraping

The single biggest factor determining whether your scraper succeeds or fails is your IP address. Anti-bot systems classify traffic primarily by IP reputation, and that classification happens before your request even reaches the target server. Mobile proxies fundamentally change the equation because they use real carrier IPs from providers like T-Mobile, Verizon, and Vodafone.

The secret is CGNAT (Carrier-Grade NAT). Mobile carriers assign a single public IP to thousands of subscribers simultaneously. When your scraper uses a 4G/5G proxy IP, the target site sees the same IP that hundreds of real humans are browsing from at that exact moment. Blocking that IP would block legitimate mobile users—something no website wants to do.

Success Rate Comparison by Proxy Type

Proxy TypeSuccess RateIP Trust LevelBlock Risk
Datacenter15-30%Very LowHigh
Residential60-75%MediumMedium
Mobile 4G/5G88-95%Very HighVery Low
Datacenter Proxies
15-30%
Residential Proxies
60-75%
Mobile Proxies (4G/5G)
88-95%

Bottom line: if you are scraping any site with meaningful anti-bot protection—e-commerce platforms, social media, search engines, or ticketing sites—mobile proxies are not optional. They are the baseline requirement for reliable data collection in 2026.

Setup: Getting Proxy Credentials from Proxies.sx

Before writing any code, you need your proxy credentials. Here is the quick path from signup to working proxy URL:

  1. 1
    Create an account at client.proxies.sx — free trial gives you 1GB bandwidth + 2 ports
  2. 2
    Purchase bandwidth (starting at $6/GB) and ports ($25/port/month) from the pricing page
  3. 3
    Copy your credentials from the dashboard — username, password, gateway host, and port number

Your proxy URL follows this format across all examples in this guide:

python
# Proxy URL format
PROXY_URL = "http://username:password@gate.proxies.sx:10001"

# Example with real credentials (replace with yours)
PROXY_URL = "http://user_abc123:securePass456@gate.proxies.sx:10001"

# For SOCKS5 (if needed)
SOCKS5_URL = "socks5://user_abc123:securePass456@gate.proxies.sx:10001"

Bandwidth is shared across all ports

You buy GB of bandwidth as a pool, and port subscriptions separately. Every port draws from the same bandwidth pool, so you only pay once for data regardless of how many ports you use.

Python Requests + Mobile Proxy

The requests library is the simplest starting point. It handles static pages, REST APIs, and any endpoint that does not require JavaScript rendering. Here is a complete working example with session management, custom headers, and error handling:

python
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Proxy configuration
PROXY_HOST = "gate.proxies.sx"
PROXY_PORT = "10001"
PROXY_USER = "your_username"
PROXY_PASS = "your_password"

proxy_url = f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
proxies = {
    "http": proxy_url,
    "https": proxy_url,
}

# Create a session with retry logic
session = requests.Session()
session.proxies.update(proxies)

# Configure retries on the transport layer
retry_strategy = Retry(
    total=3,
    backoff_factor=1,
    status_forcelist=[429, 500, 502, 503, 504],
    allowed_methods=["GET", "POST"],
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)

# Set realistic headers
session.headers.update({
    "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
                  "AppleWebKit/605.1.15 (KHTML, like Gecko) "
                  "Version/17.0 Mobile/15E148 Safari/604.1",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
})

# Make requests through the mobile proxy
try:
    response = session.get("https://httpbin.org/ip", timeout=30)
    response.raise_for_status()
    print(f"Your mobile proxy IP: {response.json()['origin']}")

    # Scrape a target page
    response = session.get("https://example.com/products", timeout=30)
    response.raise_for_status()
    print(f"Status: {response.status_code}, Length: {len(response.text)}")

except requests.exceptions.ProxyError as e:
    print(f"Proxy connection failed: {e}")
except requests.exceptions.Timeout:
    print("Request timed out - try increasing timeout or rotating IP")
except requests.exceptions.HTTPError as e:
    print(f"HTTP error: {e.response.status_code}")
finally:
    session.close()

Key points: always use a Session object for connection pooling, set a mobile User-Agent to match the proxy IP type, and include proper timeout values. The Retry adapter handles transient failures automatically at the transport level.

Playwright + Mobile Proxy (Async, Anti-Detection)

When sites render content with JavaScript or deploy sophisticated fingerprinting, you need a real browser. Playwright's async API combined with mobile proxies gives you a stealth browser session that looks like a genuine mobile user. Install it first:

bash
pip install playwright
playwright install chromium

Here is a full async Playwright scraper with proxy configuration, stealth settings, viewport emulation, and screenshot capabilities:

python
import asyncio
from playwright.async_api import async_playwright

PROXY_CONFIG = {
    "server": "http://gate.proxies.sx:10001",
    "username": "your_username",
    "password": "your_password",
}

async def scrape_with_mobile_proxy(url: str) -> dict:
    async with async_playwright() as p:
        # Launch browser with proxy and stealth settings
        browser = await p.chromium.launch(
            proxy=PROXY_CONFIG,
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--disable-features=IsolateOrigins,site-per-process",
                "--disable-infobars",
            ],
        )

        # Create context with mobile device emulation
        context = await browser.new_context(
            viewport={"width": 390, "height": 844},
            user_agent=(
                "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
                "AppleWebKit/605.1.15 (KHTML, like Gecko) "
                "Version/17.0 Mobile/15E148 Safari/604.1"
            ),
            locale="en-US",
            timezone_id="America/New_York",
            geolocation={"latitude": 40.7128, "longitude": -74.0060},
            permissions=["geolocation"],
        )

        # Remove automation indicators
        await context.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
            Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
            Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5]});
            window.chrome = { runtime: {} };
        """)

        page = await context.new_page()

        try:
            # Navigate with realistic wait behavior
            await page.goto(url, wait_until="networkidle", timeout=60000)

            # Wait for dynamic content to load
            await page.wait_for_timeout(2000)

            # Extract data
            title = await page.title()
            content = await page.content()

            # Take a screenshot for debugging
            await page.screenshot(path="debug_screenshot.png", full_page=True)

            # Example: extract all product cards
            products = await page.evaluate("""
                () => Array.from(document.querySelectorAll('.product-card')).map(el => ({
                    name: el.querySelector('.title')?.textContent?.trim(),
                    price: el.querySelector('.price')?.textContent?.trim(),
                    url: el.querySelector('a')?.href,
                }))
            """)

            return {
                "title": title,
                "products": products,
                "html_length": len(content),
                "status": "success",
            }

        except Exception as e:
            await page.screenshot(path="error_screenshot.png")
            return {"status": "error", "message": str(e)}

        finally:
            await browser.close()


# Run the scraper
if __name__ == "__main__":
    result = asyncio.run(scrape_with_mobile_proxy("https://example.com/products"))
    print(f"Scraped {len(result.get('products', []))} products")
    print(f"Status: {result['status']}")

Anti-detection tip

The add_init_script call removes common automation fingerprints. Combined with a real mobile proxy IP, this makes your Playwright session virtually indistinguishable from a real iPhone user.

Scrapy + Mobile Proxy (Middleware Setup)

Scrapy is the framework of choice for large-scale crawling. Its middleware architecture makes proxy integration clean and maintainable. Here is a custom proxy middleware that routes all requests through your PROXIES.SX mobile proxy with automatic header rotation:

middlewares.py

python
import random
from scrapy import signals

class MobileProxyMiddleware:
    """Routes all Scrapy requests through PROXIES.SX mobile proxy."""

    PROXY_URL = "http://your_username:your_password@gate.proxies.sx:10001"

    MOBILE_USER_AGENTS = [
        "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
        "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1",
        "Mozilla/5.0 (Linux; Android 14; Pixel 8 Pro) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36",
        "Mozilla/5.0 (Linux; Android 14; SM-S928B) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36",
    ]

    @classmethod
    def from_crawler(cls, crawler):
        middleware = cls()
        crawler.signals.connect(middleware.spider_opened, signal=signals.spider_opened)
        return middleware

    def spider_opened(self, spider):
        spider.logger.info(f"MobileProxyMiddleware enabled for {spider.name}")

    def process_request(self, request, spider):
        request.meta["proxy"] = self.PROXY_URL
        request.headers["User-Agent"] = random.choice(self.MOBILE_USER_AGENTS)
        request.headers["Accept-Language"] = "en-US,en;q=0.9"
        return None

    def process_exception(self, request, exception, spider):
        spider.logger.error(f"Proxy error on {request.url}: {exception}")
        return None

settings.py

python
# Enable the custom proxy middleware
DOWNLOADER_MIDDLEWARES = {
    "myproject.middlewares.MobileProxyMiddleware": 350,
    "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": 400,
    "scrapy.downloadermiddlewares.useragent.UserAgentMiddleware": None,
}

# Concurrency settings (be respectful)
CONCURRENT_REQUESTS = 8
CONCURRENT_REQUESTS_PER_DOMAIN = 4
DOWNLOAD_DELAY = 1.5
RANDOMIZE_DOWNLOAD_DELAY = True

# Retry configuration
RETRY_ENABLED = True
RETRY_TIMES = 3
RETRY_HTTP_CODES = [403, 429, 500, 502, 503, 504]

# Timeout
DOWNLOAD_TIMEOUT = 30

# Respect robots.txt (toggle based on your use case)
ROBOTSTXT_OBEY = True

# Logging
LOG_LEVEL = "INFO"

spiders/products_spider.py

python
import scrapy

class ProductsSpider(scrapy.Spider):
    name = "products"
    start_urls = ["https://example.com/products?page=1"]

    def parse(self, response):
        # Extract product data
        for product in response.css("div.product-card"):
            yield {
                "name": product.css("h3.title::text").get("").strip(),
                "price": product.css("span.price::text").get("").strip(),
                "url": response.urljoin(product.css("a::attr(href)").get("")),
                "rating": product.css("span.rating::text").get(""),
            }

        # Follow pagination
        next_page = response.css("a.next-page::attr(href)").get()
        if next_page:
            yield response.follow(next_page, callback=self.parse)

Run the spider with scrapy crawl products -o results.json. Every request automatically goes through the mobile proxy with a randomized mobile User-Agent.

Rotation Strategies: API vs Auto-Rotation

IP rotation is critical for high-volume scraping. PROXIES.SX supports two rotation methods, and choosing the right one depends on your scraping pattern.

Auto-Rotation (Timed)

Configure in your dashboard. The proxy automatically rotates the IP every N minutes (e.g., every 5, 10, or 30 minutes). Zero code changes required.

  • Best for continuous crawling
  • No extra API calls needed
  • Consistent rotation interval

On-Demand Rotation (API)

Call the rotation API endpoint to force an immediate IP change. Gives you precise control over when rotation happens.

  • Best for targeted scraping
  • Rotate after blocks/CAPTCHAs
  • Programmatic control

Here is how to implement on-demand rotation in your Python scraper. Call this function between request batches or after encountering blocks:

python
import requests
import time

class ProxyRotator:
    """Manages IP rotation via the PROXIES.SX API."""

    def __init__(self, api_key: str, port_id: str):
        self.api_key = api_key
        self.port_id = port_id
        self.base_url = "https://client.proxies.sx/api"
        self.last_rotation = 0
        self.min_rotation_interval = 10  # seconds between rotations

    def rotate_ip(self) -> bool:
        """Force an IP rotation. Returns True on success."""
        now = time.time()
        if now - self.last_rotation < self.min_rotation_interval:
            wait = self.min_rotation_interval - (now - self.last_rotation)
            print(f"Rate limit: waiting {wait:.1f}s before rotation")
            time.sleep(wait)

        try:
            response = requests.post(
                f"{self.base_url}/ports/{self.port_id}/rotate",
                headers={"Authorization": f"Bearer {self.api_key}"},
                timeout=10,
            )
            response.raise_for_status()
            self.last_rotation = time.time()
            print(f"IP rotated successfully at {time.strftime('%H:%M:%S')}")
            return True
        except requests.exceptions.RequestException as e:
            print(f"Rotation failed: {e}")
            return False

    def get_current_ip(self, proxy_url: str) -> str:
        """Check the current IP assigned to the proxy."""
        try:
            resp = requests.get(
                "https://httpbin.org/ip",
                proxies={"http": proxy_url, "https": proxy_url},
                timeout=15,
            )
            return resp.json().get("origin", "unknown")
        except Exception:
            return "unknown"


# Usage in a scraping loop
rotator = ProxyRotator(api_key="your_api_key", port_id="port_12345")
proxy_url = "http://your_username:your_password@gate.proxies.sx:10001"

urls_to_scrape = ["https://example.com/page/1", "https://example.com/page/2", ...]

for i, url in enumerate(urls_to_scrape):
    # Rotate every 10 requests
    if i > 0 and i % 10 == 0:
        rotator.rotate_ip()
        time.sleep(3)  # Wait for new IP to propagate

    response = requests.get(
        url,
        proxies={"http": proxy_url, "https": proxy_url},
        timeout=30,
    )
    print(f"[{i+1}] {url} -> {response.status_code}")

Error Handling & Retry Patterns

Production scrapers encounter 403s, 429s, timeouts, and connection resets. The difference between amateur and professional scraping is how you handle these failures. Here is a battle-tested retry decorator with exponential backoff, jitter, and automatic IP rotation on persistent failures:

python
import time
import random
import functools
import requests
from typing import Optional, Callable

def retry_with_backoff(
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    rotate_on_403: bool = True,
    rotator: Optional[object] = None,
):
    """Decorator: retries failed requests with exponential backoff and jitter."""

    def decorator(func: Callable):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None

            for attempt in range(max_retries):
                try:
                    response = func(*args, **kwargs)

                    # Success
                    if response.status_code == 200:
                        return response

                    # Rate limited - respect Retry-After header
                    if response.status_code == 429:
                        retry_after = int(response.headers.get("Retry-After", 30))
                        print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
                        time.sleep(retry_after)
                        continue

                    # Forbidden - likely IP blocked
                    if response.status_code == 403:
                        print(f"403 Forbidden on attempt {attempt + 1}")
                        if rotate_on_403 and rotator:
                            rotator.rotate_ip()
                            time.sleep(5)
                        continue

                    # Server errors - retry with backoff
                    if response.status_code >= 500:
                        delay = min(base_delay * (2 ** attempt), max_delay)
                        jitter = random.uniform(0, delay * 0.5)
                        wait = delay + jitter
                        print(f"Server error {response.status_code}. Retrying in {wait:.1f}s")
                        time.sleep(wait)
                        continue

                    # Other status codes - return as-is
                    return response

                except requests.exceptions.ConnectionError as e:
                    last_exception = e
                    delay = min(base_delay * (2 ** attempt), max_delay)
                    jitter = random.uniform(0, delay * 0.3)
                    print(f"Connection error. Retrying in {delay + jitter:.1f}s")
                    time.sleep(delay + jitter)

                except requests.exceptions.Timeout as e:
                    last_exception = e
                    print(f"Timeout on attempt {attempt + 1}/{max_retries}")
                    time.sleep(base_delay * (attempt + 1))

            raise Exception(f"All {max_retries} retries exhausted. Last error: {last_exception}")

        return wrapper
    return decorator


# Usage
@retry_with_backoff(max_retries=5, base_delay=2.0, rotate_on_403=True, rotator=rotator)
def fetch_page(url: str, session: requests.Session) -> requests.Response:
    return session.get(url, timeout=30)

# Now every call automatically retries with smart backoff
response = fetch_page("https://example.com/data", session)

Why jitter matters

Without jitter, all your retry attempts across concurrent workers hit the server at the exact same time (the "thundering herd" problem). Adding randomized jitter spreads retries across a window, dramatically improving success rates.

Production Architecture: Scaling to 1M+ Requests

Single-threaded scripts top out around 5,000-10,000 requests per day. To reach 1M+ daily requests, you need a queue-based architecture with distributed workers. Here is the production pattern we recommend for teams using PROXIES.SX mobile proxies:

Production Scraping Architecture


  +-------------------+       +------------------+       +-------------------+
  |   URL Scheduler   | ----> |   Redis Queue    | ----> |   Worker Pool     |
  |   (Priority +     |       |   (Bull / Celery)|       |   (N workers x    |
  |    Dedup)         |       |                  |       |    M ports)       |
  +-------------------+       +------------------+       +-------------------+
                                                               |
                                                               v
                                                    +-------------------+
                                                    |  PROXIES.SX       |
                                                    |  Mobile Gateway   |
                                                    |  (10-50 ports)    |
                                                    +-------------------+
                                                               |
                                                               v
                                                    +-------------------+
                                                    |  Target Sites     |
                                                    +-------------------+
                                                               |
                                                               v
                              +------------------+       +-------------------+
                              |   PostgreSQL /   | <---- |   Data Pipeline   |
                              |   S3 Storage     |       |   (Parse + Clean) |
                              +------------------+       +-------------------+
                                      |
                                      v
                              +------------------+
                              |   Monitoring     |
                              |   (Grafana /     |
                              |    Prometheus)   |
                              +------------------+
python
# worker.py - Celery worker with mobile proxy pool
import celery
import requests
import random

app = celery.Celery("scraper", broker="redis://localhost:6379/0")

# Proxy pool - each port is a separate mobile proxy endpoint
PROXY_POOL = [
    "http://user:pass@gate.proxies.sx:10001",
    "http://user:pass@gate.proxies.sx:10002",
    "http://user:pass@gate.proxies.sx:10003",
    "http://user:pass@gate.proxies.sx:10004",
    "http://user:pass@gate.proxies.sx:10005",
]

@app.task(bind=True, max_retries=3, default_retry_delay=10)
def scrape_url(self, url: str) -> dict:
    """Scrape a single URL through a random mobile proxy port."""
    proxy = random.choice(PROXY_POOL)

    try:
        response = requests.get(
            url,
            proxies={"http": proxy, "https": proxy},
            headers={
                "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
                              "AppleWebKit/605.1.15 (KHTML, like Gecko) "
                              "Version/17.0 Mobile/15E148 Safari/604.1"
            },
            timeout=30,
        )
        response.raise_for_status()

        return {
            "url": url,
            "status": response.status_code,
            "content_length": len(response.text),
            "proxy_port": proxy.split(":")[-1],
        }

    except requests.exceptions.RequestException as exc:
        raise self.retry(exc=exc)


# Dispatch URLs
def enqueue_urls(urls: list):
    """Send URLs to the task queue for distributed scraping."""
    for url in urls:
        scrape_url.delay(url)
    print(f"Enqueued {len(urls)} URLs for scraping")

10-50

Proxy ports

50-200

Concurrent workers

1M+

Requests per day

The key insight: each PROXIES.SX port is an independent mobile proxy endpoint. By distributing requests across multiple ports, you multiply your effective throughput while maintaining the high trust score of mobile IPs. Bandwidth is shared across all ports, so you pay per GB transferred regardless of how many ports you use. Check the pricing page for volume discounts on ports and bandwidth.

Frequently Asked Questions

What Python library is best for web scraping with mobile proxies?

It depends on your use case. Use Requests for simple API calls and static HTML pages - it is fast and lightweight. Use Playwright when sites require JavaScript rendering or have sophisticated fingerprinting detection. Use Scrapy for large-scale crawls with hundreds of thousands of pages, where its async architecture, middleware pipeline, and built-in scheduling shine. All three integrate seamlessly with PROXIES.SX mobile proxies.

How do I configure a mobile proxy in Python Requests?

Create a proxies dictionary with your PROXIES.SX credentials: proxies = {"http": "http://user:pass@gate.proxies.sx:10001", "https": "http://user:pass@gate.proxies.sx:10001"}. Pass it to any request call with requests.get(url, proxies=proxies), or set it on a Session object for persistent connection pooling.

Why are mobile proxies better than datacenter proxies for scraping?

Mobile proxies use real 4G/5G carrier IPs that are shared by thousands of legitimate users through CGNAT (Carrier-Grade NAT). Websites cannot block these IPs without also blocking real mobile users. Datacenter IPs come from known hosting ranges and are trivially identified and blocked by anti-bot systems. In our testing, mobile proxies achieve 88-95% success rates vs 15-30% for datacenter IPs on protected sites.

How do I rotate IPs with mobile proxies in Python?

PROXIES.SX offers two methods. Auto-rotation is configured in your dashboard - the IP changes every N minutes automatically with zero code changes. On-demand rotation uses an API endpoint to force immediate IP changes, giving you programmatic control. Most production scrapers use auto-rotation for baseline diversity and trigger on-demand rotation when they encounter blocks.

Can I use async Playwright with mobile proxies?

Yes. Pass the proxy configuration when launching the browser: browser = await playwright.chromium.launch(proxy={"server": "http://gate.proxies.sx:10001", "username": "user", "password": "pass"}). All browser traffic routes through the mobile proxy, and you get the full benefit of carrier-grade mobile IPs combined with real browser fingerprinting.

How do I handle 403 and 429 errors when scraping with proxies?

Implement exponential backoff with jitter. On 403 (Forbidden), rotate your IP via the API and retry with a delay of 5-10 seconds. On 429 (Too Many Requests), check the Retry-After header and wait accordingly. If no header is present, use exponential backoff starting at 2 seconds. With mobile proxies, these errors are significantly less frequent than with datacenter or residential proxies.

What is the best Scrapy middleware setup for mobile proxies?

Create a custom ProxyMiddleware class that sets request.meta["proxy"] in the process_request method. Enable it in DOWNLOADER_MIDDLEWARES at priority 350, disable the default UserAgentMiddleware, and add randomized mobile User-Agent rotation in the same middleware. Combine with DOWNLOAD_DELAY of 1-2 seconds and CONCURRENT_REQUESTS of 8-16 per domain for optimal results.

How many concurrent requests can I run through mobile proxies?

Each PROXIES.SX port supports multiple concurrent connections. For production workloads, we recommend 5-10 concurrent requests per port to maintain reliability. With 10 ports, that gives you 50-100 concurrent requests. With 50 ports, you can run 250-500+ concurrent requests. Bandwidth is pooled across all ports, so scaling ports does not multiply your bandwidth cost.

Start Scraping with Mobile Proxies Today

Get 1GB free trial bandwidth + 2 ports. Every code example in this guide works out of the box with PROXIES.SX credentials. Setup takes less than 60 seconds.