What is Crawlee and how does it work with mobile proxies?

Crawlee is a full-featured web scraping framework for Node.js built by Apify. It provides ProxyConfiguration for managing proxy pools, RequestQueue for URL scheduling, and built-in retry logic. Configure it with your PROXIES.SX proxy URLs and it handles rotation, error recovery, and concurrency automatically.

Node.js Scraping with Mobile Proxies

Q: What is the best Node.js library for web scraping with mobile proxies?

It depends on the target site. Use Cheerio + Axios for static HTML pages where speed matters. Use Puppeteer for sites that require JavaScript rendering and Google Chrome compatibility. Use Playwright for cross-browser testing and advanced anti-detection. Use Crawlee for large-scale crawling with built-in proxy rotation. All four integrate seamlessly with PROXIES.SX mobile proxies.

Q: How do I configure a mobile proxy in Puppeteer?

Launch Puppeteer with the --proxy-server flag: puppeteer.launch({ args: ["--proxy-server=http://gate.proxies.sx:10001"] }). Then authenticate with page.authenticate({ username: "user", password: "pass" }). All browser traffic will route through the mobile proxy automatically.

Q: How do I use a proxy with Axios in Node.js?

Use the https-proxy-agent package: import { HttpsProxyAgent } from "https-proxy-agent". Create an agent with new HttpsProxyAgent("http://user:pass@gate.proxies.sx:10001") and pass it as the httpsAgent option in your Axios request config.

Q: Can I use Playwright with mobile proxies in Node.js?

Yes. Pass the proxy configuration when launching the browser: browser = await chromium.launch({ proxy: { server: "http://gate.proxies.sx:10001", username: "user", password: "pass" } }). Playwright supports HTTP, HTTPS, and SOCKS5 proxy protocols natively.

Q: Is TypeScript recommended for Node.js web scraping?

Yes. TypeScript provides type safety for scraped data structures, better IDE autocomplete for framework APIs like Puppeteer and Playwright, and catches common errors at compile time. All major scraping libraries ship with TypeScript type definitions. The examples in this guide use TypeScript for production-quality code.

Why Node.js for Web Scraping

Node.js has become the dominant runtime for web scraping in 2026, and for good reason. Its event-driven, non-blocking I/O model is purpose-built for the kind of concurrent network operations that scraping demands. While a Python scraper waits synchronously for each proxy response, a Node.js scraper fires off hundreds of requests simultaneously and processes them as they arrive.

The deeper advantage is ecosystem. Puppeteer and Playwright were built for Node.js first—their Node APIs are the most mature, best documented, and first to receive new features. When you combine native browser automation with real 4G/5G mobile proxy IPs from PROXIES.SX, you get a scraping stack that handles JavaScript-heavy SPAs, anti-bot detection, and high concurrency without breaking a sweat.

Event-Driven I/O

Non-blocking async handles thousands of concurrent proxy connections

Native Browser Control

Puppeteer and Playwright were built for Node.js first

JSON-Native

Parse API responses and structured data without serialization overhead

NPM Ecosystem

2M+ packages including scraping, parsing, and proxy utilities

JSON is the native data format of JavaScript, so parsing API responses and structured data is zero-friction. Combined with TypeScript for type safety on scraped data structures, Node.js gives you the fastest path from "idea" to "production scraper" of any runtime available today.

Setup: Proxies.sx Mobile Proxy Credentials

Before writing any code, you need your proxy credentials. Here is the quick path from signup to working proxy URL:

1
Create an account at client.proxies.sx — free trial gives you 1GB bandwidth + 2 ports
2
Purchase bandwidth (starting at $6/GB) and ports ($25/port/month) from the pricing page
3
Copy your credentials from the dashboard — username, password, gateway host, and port number

Your proxy URL follows this format across all examples in this guide:

typescript

// Proxy URL format for all Node.js examples
const PROXY_HOST = "gate.proxies.sx";
const PROXY_PORT = 10001;
const PROXY_USER = "your_username";
const PROXY_PASS = "your_password";

const PROXY_URL = `http://${PROXY_USER}:${PROXY_PASS}@${PROXY_HOST}:${PROXY_PORT}`;
// => http://your_username:your_password@gate.proxies.sx:10001

// For SOCKS5 (if needed)
const SOCKS5_URL = `socks5://${PROXY_USER}:${PROXY_PASS}@${PROXY_HOST}:${PROXY_PORT}`;

Bandwidth is shared across all ports

You buy GB of bandwidth as a pool, and port subscriptions separately. Every port draws from the same bandwidth pool, so you only pay once for data regardless of how many ports you use.

Cheerio + Axios + Mobile Proxy

For static HTML pages and server-rendered content, Cheerio + Axios is the fastest option in Node.js. No browser overhead, no Chromium download—just raw HTTP requests with jQuery-like DOM parsing. This combination is ideal for scraping product listings, news articles, and any page that does not require JavaScript execution.

bash

npm install axios cheerio https-proxy-agent

Here is a complete working scraper with proxy agent configuration, Cheerio parsing, and proper error handling:

typescript

import axios, { AxiosError } from "axios";
import * as cheerio from "cheerio";
import { HttpsProxyAgent } from "https-proxy-agent";

// Proxy configuration
const PROXY_URL = "http://your_username:your_password@gate.proxies.sx:10001";
const proxyAgent = new HttpsProxyAgent(PROXY_URL);

// Create an Axios instance with proxy and realistic headers
const client = axios.create({
  httpAgent: proxyAgent,
  httpsAgent: proxyAgent,
  timeout: 30_000,
  headers: {
    "User-Agent":
      "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
      "AppleWebKit/605.1.15 (KHTML, like Gecko) " +
      "Version/17.0 Mobile/15E148 Safari/604.1",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
  },
});

interface Product {
  name: string;
  price: string;
  url: string;
  rating: string | null;
}

async function scrapeProducts(url: string): Promise<Product[]> {
  try {
    const { data: html, status } = await client.get<string>(url);
    console.log(`Fetched ${url} - Status: ${status}, Length: ${html.length}`);

    const $ = cheerio.load(html);
    const products: Product[] = [];

    $("div.product-card").each((_, el) => {
      products.push({
        name: $(el).find("h3.title").text().trim(),
        price: $(el).find("span.price").text().trim(),
        url: $(el).find("a").attr("href") ?? "",
        rating: $(el).find("span.rating").text().trim() || null,
      });
    });

    return products;
  } catch (error) {
    if (error instanceof AxiosError) {
      if (error.code === "ECONNREFUSED") {
        console.error("Proxy connection refused - check credentials");
      } else if (error.response?.status === 403) {
        console.error("403 Forbidden - IP may be blocked, rotate and retry");
      } else if (error.response?.status === 429) {
        console.error("429 Rate limited - back off and retry");
      } else {
        console.error(`HTTP error: ${error.response?.status ?? error.message}`);
      }
    }
    throw error;
  }
}

// Verify proxy IP first, then scrape
async function main() {
  const { data } = await client.get("https://httpbin.org/ip");
  console.log("Mobile proxy IP:", data.origin);

  const products = await scrapeProducts("https://example.com/products");
  console.log(`Scraped ${products.length} products`);
  console.log(JSON.stringify(products, null, 2));
}

main().catch(console.error);

Key points: the HttpsProxyAgent handles both HTTP and HTTPS traffic through your mobile proxy. Use a shared Axios instance for connection pooling, set a mobile User-Agent that matches your proxy IP type, and always include a timeout. Cheerio's jQuery-like API makes DOM traversal intuitive for anyone with frontend experience.

Puppeteer + Mobile Proxy

When sites render content with JavaScript or deploy fingerprinting, you need a real browser. Puppeteer drives headless Chrome with the --proxy-server flag to route all traffic through your mobile proxy. Combined with page.authenticate(), this creates a browser session that looks identical to a real mobile user.

bash

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Here is a full Puppeteer scraper with proxy authentication, stealth plugins, and screenshot verification:

typescript

import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

// Enable stealth mode to avoid detection
puppeteer.use(StealthPlugin());

const PROXY_HOST = "gate.proxies.sx";
const PROXY_PORT = 10001;
const PROXY_USER = "your_username";
const PROXY_PASS = "your_password";

interface ScrapeResult {
  title: string;
  products: Array<{ name: string; price: string; url: string }>;
  screenshotPath: string;
  status: "success" | "error";
  error?: string;
}

async function scrapeWithPuppeteer(url: string): Promise<ScrapeResult> {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      `--proxy-server=http://${PROXY_HOST}:${PROXY_PORT}`,
      "--disable-blink-features=AutomationControlled",
      "--disable-features=IsolateOrigins,site-per-process",
      "--disable-infobars",
      "--no-sandbox",
      "--disable-setuid-sandbox",
      "--window-size=390,844",
    ],
  });

  const page = await browser.newPage();

  try {
    // Authenticate with the proxy
    await page.authenticate({
      username: PROXY_USER,
      password: PROXY_PASS,
    });

    // Set mobile viewport and user agent
    await page.setViewport({ width: 390, height: 844, isMobile: true });
    await page.setUserAgent(
      "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
      "AppleWebKit/605.1.15 (KHTML, like Gecko) " +
      "Version/17.0 Mobile/15E148 Safari/604.1"
    );

    // Remove automation indicators
    await page.evaluateOnNewDocument(() => {
      Object.defineProperty(navigator, "webdriver", { get: () => undefined });
      Object.defineProperty(navigator, "languages", { get: () => ["en-US", "en"] });
      Object.defineProperty(navigator, "plugins", { get: () => [1, 2, 3, 4, 5] });
      // @ts-expect-error - Chrome runtime mock
      window.chrome = { runtime: {} };
    });

    // Navigate with realistic wait behavior
    await page.goto(url, { waitUntil: "networkidle2", timeout: 60_000 });

    // Wait for dynamic content
    await page.waitForSelector(".product-card", { timeout: 10_000 }).catch(() => {
      console.log("No .product-card found, continuing with page content");
    });

    const title = await page.title();

    // Extract product data from the page
    const products = await page.evaluate(() =>
      Array.from(document.querySelectorAll(".product-card")).map((el) => ({
        name: el.querySelector(".title")?.textContent?.trim() ?? "",
        price: el.querySelector(".price")?.textContent?.trim() ?? "",
        url: (el.querySelector("a") as HTMLAnchorElement)?.href ?? "",
      }))
    );

    // Take a screenshot for verification
    const screenshotPath = "puppeteer_result.png";
    await page.screenshot({ path: screenshotPath, fullPage: true });

    return { title, products, screenshotPath, status: "success" };
  } catch (error) {
    await page.screenshot({ path: "puppeteer_error.png" }).catch(() => {});
    return {
      title: "",
      products: [],
      screenshotPath: "puppeteer_error.png",
      status: "error",
      error: error instanceof Error ? error.message : String(error),
    };
  } finally {
    await browser.close();
  }
}

// Run the scraper
async function main() {
  const result = await scrapeWithPuppeteer("https://example.com/products");
  console.log(`Status: ${result.status}`);
  console.log(`Title: ${result.title}`);
  console.log(`Products found: ${result.products.length}`);
  console.log(`Screenshot: ${result.screenshotPath}`);
}

main().catch(console.error);

Anti-detection tip

The puppeteer-extra-plugin-stealth package patches over a dozen browser fingerprint leaks automatically. Combined with a real mobile proxy IP, your Puppeteer session is virtually indistinguishable from a real iPhone user browsing the web.

Playwright (Node.js) + Mobile Proxy

Playwright offers built-in proxy configuration at the browser level, cross-browser support (Chromium, Firefox, WebKit), and a more modern async API than Puppeteer. Its context isolation makes it ideal for running multiple scraping sessions with different proxy configurations simultaneously.

bash

npm install playwright

typescript

import { chromium, type Browser, type BrowserContext } from "playwright";

const PROXY_CONFIG = {
  server: "http://gate.proxies.sx:10001",
  username: "your_username",
  password: "your_password",
};

interface ScrapedPage {
  url: string;
  title: string;
  content: string;
  links: string[];
  status: "success" | "error";
}

async function scrapeWithPlaywright(urls: string[]): Promise<ScrapedPage[]> {
  // Launch browser with proxy configuration
  const browser: Browser = await chromium.launch({
    headless: true,
    proxy: PROXY_CONFIG,
  });

  // Create a context with mobile device emulation
  const context: BrowserContext = await browser.newContext({
    viewport: { width: 390, height: 844 },
    userAgent:
      "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
      "AppleWebKit/605.1.15 (KHTML, like Gecko) " +
      "Version/17.0 Mobile/15E148 Safari/604.1",
    locale: "en-US",
    timezoneId: "America/New_York",
    geolocation: { latitude: 40.7128, longitude: -74.006 },
    permissions: ["geolocation"],
    isMobile: true,
    hasTouch: true,
  });

  // Remove automation fingerprints
  await context.addInitScript(() => {
    Object.defineProperty(navigator, "webdriver", { get: () => undefined });
    Object.defineProperty(navigator, "languages", { get: () => ["en-US", "en"] });
    Object.defineProperty(navigator, "plugins", { get: () => [1, 2, 3, 4, 5] });
  });

  const results: ScrapedPage[] = [];

  for (const url of urls) {
    const page = await context.newPage();

    try {
      await page.goto(url, { waitUntil: "networkidle", timeout: 60_000 });

      // Wait for dynamic content
      await page.waitForTimeout(1500);

      const title = await page.title();
      const content = await page.content();

      // Extract all links on the page
      const links = await page.evaluate(() =>
        Array.from(document.querySelectorAll("a[href]"))
          .map((a) => (a as HTMLAnchorElement).href)
          .filter((href) => href.startsWith("http"))
      );

      results.push({ url, title, content, links, status: "success" });
      console.log(`[OK] ${url} - ${title} (${links.length} links)`);
    } catch (error) {
      results.push({
        url,
        title: "",
        content: "",
        links: [],
        status: "error",
      });
      console.error(`[FAIL] ${url} - ${error instanceof Error ? error.message : error}`);
    } finally {
      await page.close();
    }
  }

  await context.close();
  await browser.close();

  return results;
}

// Scrape multiple pages through the mobile proxy
async function main() {
  // Verify proxy IP first
  const browser = await chromium.launch({ proxy: PROXY_CONFIG });
  const page = await browser.newPage();
  await page.goto("https://httpbin.org/ip");
  const ip = await page.textContent("body");
  console.log("Mobile proxy IP:", ip);
  await browser.close();

  // Scrape target pages
  const results = await scrapeWithPlaywright([
    "https://example.com/products?page=1",
    "https://example.com/products?page=2",
    "https://example.com/products?page=3",
  ]);

  console.log(`\nScraped ${results.length} pages`);
  console.log(`Success: ${results.filter((r) => r.status === "success").length}`);
  console.log(`Failed: ${results.filter((r) => r.status === "error").length}`);
}

main().catch(console.error);

Playwright vs Puppeteer

Playwright supports Chromium, Firefox, and WebKit from a single API. Its browser contexts provide true isolation—each context gets its own cookies, localStorage, and proxy config. This makes it easy to run multiple scraping sessions with different mobile proxy ports in parallel.

Crawlee + Mobile Proxy

Crawlee is the Node.js equivalent of Scrapy—a full-featured crawling framework with built-in proxy rotation, request queuing, auto-scaling, and storage. It wraps Puppeteer, Playwright, or plain HTTP under a unified API, letting you switch between headless browser and lightweight HTTP crawling without rewriting your scraper logic.

bash

npm install crawlee playwright

typescript

import {
  PlaywrightCrawler,
  ProxyConfiguration,
  Dataset,
  type PlaywrightCrawlingContext,
} from "crawlee";

// Configure mobile proxy pool with multiple PROXIES.SX ports
const proxyConfiguration = new ProxyConfiguration({
  proxyUrls: [
    "http://your_username:your_password@gate.proxies.sx:10001",
    "http://your_username:your_password@gate.proxies.sx:10002",
    "http://your_username:your_password@gate.proxies.sx:10003",
  ],
});

// Create a Playwright-based crawler with mobile proxy
const crawler = new PlaywrightCrawler({
  proxyConfiguration,

  // Concurrency and rate limiting
  maxConcurrency: 5,
  maxRequestRetries: 3,
  requestHandlerTimeoutSecs: 60,

  // Launch options for stealth
  launchContext: {
    launchOptions: {
      args: [
        "--disable-blink-features=AutomationControlled",
        "--disable-infobars",
      ],
    },
  },

  // Browser context with mobile emulation
  browserPoolOptions: {
    useFingerprints: true,
    fingerprintOptions: {
      fingerprintGeneratorOptions: {
        devices: ["mobile"],
        operatingSystems: ["ios"],
      },
    },
  },

  // Main request handler
  async requestHandler({ request, page, enqueueLinks, log }: PlaywrightCrawlingContext) {
    log.info(`Scraping: ${request.url}`);

    // Wait for page content to load
    await page.waitForSelector(".product-card", { timeout: 10_000 }).catch(() => {
      log.warning("No product cards found, parsing available content");
    });

    // Extract product data
    const products = await page.evaluate(() =>
      Array.from(document.querySelectorAll(".product-card")).map((el) => ({
        name: el.querySelector(".title")?.textContent?.trim() ?? "",
        price: el.querySelector(".price")?.textContent?.trim() ?? "",
        url: (el.querySelector("a") as HTMLAnchorElement)?.href ?? "",
      }))
    );

    // Store results in the default dataset
    await Dataset.pushData({
      url: request.url,
      products,
      scrapedAt: new Date().toISOString(),
    });

    log.info(`Found ${products.length} products on ${request.url}`);

    // Follow pagination links automatically
    await enqueueLinks({
      selector: "a.next-page",
      label: "PRODUCT_PAGE",
    });
  },

  // Handle failures
  async failedRequestHandler({ request, log }) {
    log.error(`Request failed after retries: ${request.url}`);
    await Dataset.pushData({
      url: request.url,
      error: request.errorMessages.join("; "),
      scrapedAt: new Date().toISOString(),
    });
  },
});

// Start the crawler with seed URLs
async function main() {
  await crawler.addRequests([
    "https://example.com/products?page=1",
    "https://example.com/products?page=2",
    "https://example.com/products?page=3",
  ]);

  await crawler.run();

  // Export results
  const dataset = await Dataset.open();
  const { items } = await dataset.getData();
  console.log(`\nCrawl complete. ${items.length} pages processed.`);
}

main().catch(console.error);

Crawlee's ProxyConfiguration automatically rotates through your PROXIES.SX proxy URLs, distributing requests across ports. The built-inRequestQueue deduplicates URLs, and Dataset provides structured storage with JSON, CSV, and Excel export. For teams scaling beyond simple scripts, Crawlee removes the need to build your own scheduling and retry infrastructure.

IP Rotation & Session Management

IP rotation is critical for high-volume scraping. PROXIES.SX supports two rotation methods, and choosing the right one depends on your scraping pattern.

Rotating Sessions

Each request gets a different IP. Configure auto-rotation in your dashboard so the IP changes every N minutes with zero code changes.

Best for scraping search results
Minimizes IP-based rate limiting
No session state needed

Sticky Sessions

Same IP persists for a set duration. Essential when you need to maintain login state or navigate multi-page flows through the proxy.

Best for authenticated scraping
Maintains cookies and sessions
Multi-page navigation flows

Here is how to implement on-demand rotation and connection pooling in Node.js. Call this between request batches or after encountering blocks:

typescript

import axios from "axios";
import { HttpsProxyAgent } from "https-proxy-agent";

interface ProxyPort {
  url: string;
  agent: HttpsProxyAgent<string>;
  lastUsed: number;
  requestCount: number;
}

class ProxyPool {
  private ports: ProxyPort[];
  private apiKey: string;
  private apiBase = "https://client.proxies.sx/api";
  private roundRobinIndex = 0;

  constructor(proxyUrls: string[], apiKey: string) {
    this.apiKey = apiKey;
    this.ports = proxyUrls.map((url) => ({
      url,
      agent: new HttpsProxyAgent(url),
      lastUsed: 0,
      requestCount: 0,
    }));
  }

  /** Get the next proxy agent using round-robin selection */
  getNext(): { agent: HttpsProxyAgent<string>; portIndex: number } {
    const port = this.ports[this.roundRobinIndex];
    port.lastUsed = Date.now();
    port.requestCount++;

    const portIndex = this.roundRobinIndex;
    this.roundRobinIndex = (this.roundRobinIndex + 1) % this.ports.length;

    return { agent: port.agent, portIndex };
  }

  /** Force IP rotation on a specific port */
  async rotateIp(portId: string): Promise<boolean> {
    try {
      await axios.post(
        `${this.apiBase}/ports/${portId}/rotate`,
        {},
        {
          headers: { Authorization: `Bearer ${this.apiKey}` },
          timeout: 10_000,
        }
      );
      console.log(`IP rotated on port ${portId} at ${new Date().toISOString()}`);
      return true;
    } catch (error) {
      console.error(`Rotation failed on port ${portId}:`, error);
      return false;
    }
  }

  /** Get current IP for a specific port */
  async getCurrentIp(portIndex: number): Promise<string> {
    try {
      const { data } = await axios.get("https://httpbin.org/ip", {
        httpAgent: this.ports[portIndex].agent,
        httpsAgent: this.ports[portIndex].agent,
        timeout: 15_000,
      });
      return data.origin;
    } catch {
      return "unknown";
    }
  }

  /** Get pool statistics */
  getStats() {
    return this.ports.map((p, i) => ({
      index: i,
      requestCount: p.requestCount,
      lastUsed: new Date(p.lastUsed).toISOString(),
    }));
  }
}

// Usage
const pool = new ProxyPool(
  [
    "http://user:pass@gate.proxies.sx:10001",
    "http://user:pass@gate.proxies.sx:10002",
    "http://user:pass@gate.proxies.sx:10003",
  ],
  "your_api_key"
);

async function scrapeWithPool(urls: string[]) {
  for (const url of urls) {
    const { agent, portIndex } = pool.getNext();

    const { data } = await axios.get(url, {
      httpAgent: agent,
      httpsAgent: agent,
      timeout: 30_000,
    });

    console.log(`[${portIndex}] ${url} -> ${data.length} bytes`);
  }

  console.log("Pool stats:", pool.getStats());
}

scrapeWithPool(["https://example.com/1", "https://example.com/2"]);

Error Handling & Production Patterns

Production scrapers encounter 403s, 429s, timeouts, and connection resets. The difference between a weekend project and a production system is how you handle these failures. Here is a battle-tested retry utility with exponential backoff, jitter, and automatic IP rotation on persistent failures:

typescript

import axios, { AxiosError, type AxiosInstance, type AxiosResponse } from "axios";
import { HttpsProxyAgent } from "https-proxy-agent";

interface RetryConfig {
  maxRetries: number;
  baseDelay: number;
  maxDelay: number;
  rotateOnBlock: boolean;
  onRotate?: () => Promise<void>;
}

const DEFAULT_RETRY_CONFIG: RetryConfig = {
  maxRetries: 5,
  baseDelay: 1000,
  maxDelay: 60_000,
  rotateOnBlock: true,
};

/** Sleep with optional jitter */
function sleep(ms: number, jitter = true): Promise<void> {
  const actual = jitter ? ms + Math.random() * ms * 0.5 : ms;
  return new Promise((resolve) => setTimeout(resolve, actual));
}

/** Retry wrapper with exponential backoff and proxy rotation */
async function fetchWithRetry(
  client: AxiosInstance,
  url: string,
  config: Partial<RetryConfig> = {}
): Promise<AxiosResponse> {
  const opts = { ...DEFAULT_RETRY_CONFIG, ...config };
  let lastError: Error | null = null;

  for (let attempt = 0; attempt < opts.maxRetries; attempt++) {
    try {
      const response = await client.get(url);

      // Success
      if (response.status === 200) return response;

      // Rate limited
      if (response.status === 429) {
        const retryAfter = parseInt(response.headers["retry-after"] ?? "30", 10);
        console.warn(`[429] Rate limited on attempt ${attempt + 1}. Waiting ${retryAfter}s`);
        await sleep(retryAfter * 1000, false);
        continue;
      }

      // Forbidden - likely IP blocked
      if (response.status === 403) {
        console.warn(`[403] Blocked on attempt ${attempt + 1}`);
        if (opts.rotateOnBlock && opts.onRotate) {
          await opts.onRotate();
          await sleep(5000);
        }
        continue;
      }

      // Server errors
      if (response.status >= 500) {
        const delay = Math.min(opts.baseDelay * 2 ** attempt, opts.maxDelay);
        console.warn(`[${response.status}] Server error. Retrying in ${delay}ms`);
        await sleep(delay);
        continue;
      }

      return response;
    } catch (error) {
      lastError = error instanceof Error ? error : new Error(String(error));

      if (error instanceof AxiosError) {
        if (error.code === "ECONNRESET" || error.code === "ETIMEDOUT") {
          const delay = Math.min(opts.baseDelay * 2 ** attempt, opts.maxDelay);
          console.warn(`[${error.code}] Connection error. Retrying in ${delay}ms`);
          await sleep(delay);
          continue;
        }
      }

      // Unknown error - backoff and retry
      const delay = Math.min(opts.baseDelay * 2 ** attempt, opts.maxDelay);
      console.error(`[ERROR] Attempt ${attempt + 1}: ${lastError.message}. Retrying in ${delay}ms`);
      await sleep(delay);
    }
  }

  throw new Error(`All ${opts.maxRetries} retries exhausted. Last error: ${lastError?.message}`);
}

/** Proxy health check utility */
async function checkProxyHealth(proxyUrl: string): Promise<{
  healthy: boolean;
  ip: string;
  latencyMs: number;
}> {
  const agent = new HttpsProxyAgent(proxyUrl);
  const start = Date.now();

  try {
    const { data } = await axios.get("https://httpbin.org/ip", {
      httpAgent: agent,
      httpsAgent: agent,
      timeout: 15_000,
    });

    return {
      healthy: true,
      ip: data.origin,
      latencyMs: Date.now() - start,
    };
  } catch {
    return {
      healthy: false,
      ip: "unknown",
      latencyMs: Date.now() - start,
    };
  }
}

// Usage example
async function main() {
  const proxyUrl = "http://user:pass@gate.proxies.sx:10001";
  const agent = new HttpsProxyAgent(proxyUrl);

  // Health check first
  const health = await checkProxyHealth(proxyUrl);
  console.log("Proxy health:", health);

  if (!health.healthy) {
    console.error("Proxy is not responding. Check credentials.");
    process.exit(1);
  }

  // Create client with proxy
  const client = axios.create({
    httpAgent: agent,
    httpsAgent: agent,
    timeout: 30_000,
    headers: {
      "User-Agent":
        "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
        "AppleWebKit/605.1.15 (KHTML, like Gecko) " +
        "Version/17.0 Mobile/15E148 Safari/604.1",
    },
  });

  // Scrape with automatic retries and backoff
  const response = await fetchWithRetry(client, "https://example.com/data", {
    maxRetries: 5,
    baseDelay: 2000,
    rotateOnBlock: true,
    onRotate: async () => {
      console.log("Rotating IP via API...");
      // Call PROXIES.SX rotation API here
    },
  });

  console.log(`Success: ${response.status}, ${response.data.length} bytes`);
}

main().catch(console.error);

Why jitter matters

Without jitter, all your retry attempts across concurrent workers hit the server at the exact same time (the "thundering herd" problem). Adding randomized jitter spreads retries across a window, dramatically improving success rates. The sleep function above adds 0-50% random jitter by default.

Frequently Asked Questions

What is the best Node.js library for web scraping with mobile proxies?

It depends on your target site. Use Cheerio + Axios for static HTML pages where speed and low resource usage matter - no browser overhead, just raw HTTP. Use Puppeteer when you need JavaScript rendering and Google Chrome compatibility. Use Playwright for cross-browser support and advanced context isolation. Use Crawlee for large-scale crawling with built-in proxy rotation, retry logic, and data storage. All four integrate seamlessly with PROXIES.SX mobile proxies.

How do I configure a mobile proxy in Puppeteer?

Launch Puppeteer with the --proxy-server argument: puppeteer.launch({ args: ["--proxy-server=http://gate.proxies.sx:10001"] }). Then call page.authenticate({ username: "user", password: "pass" }) before navigating. All browser traffic automatically routes through the mobile proxy. Add puppeteer-extra-plugin-stealth for anti-detection.

How do I use a proxy with Axios in Node.js?

Install the https-proxy-agent package. Create an agent: const agent = new HttpsProxyAgent("http://user:pass@gate.proxies.sx:10001"). Pass it as both httpAgent and httpsAgent in your Axios config. For persistent connections, create a shared Axios instance with axios.create({ httpAgent: agent, httpsAgent: agent }).

Can I use Playwright with mobile proxies in Node.js?

Yes. Playwright has built-in proxy support. Pass the proxy configuration when launching: chromium.launch({ proxy: { server: "http://gate.proxies.sx:10001", username: "user", password: "pass" } }). Playwright supports HTTP, HTTPS, and SOCKS5 protocols. Each browser context can have its own proxy configuration for parallel scraping with different IPs.

How do I rotate IPs in Node.js scrapers?

PROXIES.SX supports two methods. Auto-rotation changes your IP on a timer configured in the dashboard with zero code changes. On-demand rotation uses a POST request to the rotation API endpoint to force immediate IP changes. In Node.js, use fetch or axios to call the endpoint between request batches or after encountering 403/429 errors.

What is Crawlee and why should I use it?

Crawlee is a full-featured web scraping framework for Node.js built by the team at Apify. It provides ProxyConfiguration for automatic proxy pool management, RequestQueue for URL deduplication and scheduling, Dataset for structured data storage, and built-in exponential backoff. It wraps Puppeteer, Playwright, or plain HTTP, so you can switch between browser and lightweight scraping without rewriting code.

How do I handle errors and retries in Node.js scrapers?

Implement exponential backoff with jitter using async/await. On 403, rotate your IP via the PROXIES.SX API and retry after 5 seconds. On 429, read the Retry-After header and wait accordingly. For connection errors (ECONNRESET, ETIMEDOUT), use increasing delays starting at 1-2 seconds. Wrap your scraping logic in a reusable retry function with configurable max retries and delay multiplier.

Is TypeScript recommended for Node.js web scraping?

Strongly recommended. TypeScript provides type safety for scraped data structures, ensuring your parser correctly handles missing fields and edge cases. All major scraping libraries (Puppeteer, Playwright, Cheerio, Crawlee) ship with TypeScript types. The compiler catches null reference errors, incorrect property access, and type mismatches before your scraper runs, saving hours of debugging.

Start Scraping with Mobile Proxies Today

Get 1GB free trial bandwidth + 2 ports. Every code example in this guide works out of the box with PROXIES.SX credentials. Setup takes less than 60 seconds.

Start Free Trial View Pricing

Related Resources

Pricing

Starting at $6/GB with volume discounts

Documentation

API reference, setup guides, and integration docs

Node.js Use Cases

Mobile proxy integration guides for Node.js developers

Puppeteer Integration

Headless browser automation with mobile proxies

AI Agent Infrastructure

Use Cases

AI & AutomationHOT

Data & Scraping

Social & Media

Privacy & Crypto

Node.js Web Scraping with Mobile Proxies: Puppeteer, Playwright, Cheerio

Why Node.js for Web Scraping

Event-Driven I/O

Native Browser Control

JSON-Native

NPM Ecosystem

Setup: Proxies.sx Mobile Proxy Credentials

Cheerio + Axios + Mobile Proxy

Puppeteer + Mobile Proxy

Playwright (Node.js) + Mobile Proxy

Crawlee + Mobile Proxy

IP Rotation & Session Management

Rotating Sessions

Sticky Sessions

Error Handling & Production Patterns

Frequently Asked Questions

What is the best Node.js library for web scraping with mobile proxies?

How do I configure a mobile proxy in Puppeteer?

How do I use a proxy with Axios in Node.js?

Can I use Playwright with mobile proxies in Node.js?

How do I rotate IPs in Node.js scrapers?

What is Crawlee and why should I use it?

How do I handle errors and retries in Node.js scrapers?

Is TypeScript recommended for Node.js web scraping?

Start Scraping with Mobile Proxies Today

Related Resources

Continue Reading

Python Scraping Guide

Best Proxies for Web Scraping 2026

AI Browser Automation 2026