Why Node.js for Web Scraping
Node.js has become the dominant runtime for web scraping in 2026, and for good reason. Its event-driven, non-blocking I/O model is purpose-built for the kind of concurrent network operations that scraping demands. While a Python scraper waits synchronously for each proxy response, a Node.js scraper fires off hundreds of requests simultaneously and processes them as they arrive.
The deeper advantage is ecosystem. Puppeteer and Playwright were built for Node.js first—their Node APIs are the most mature, best documented, and first to receive new features. When you combine native browser automation with real 4G/5G mobile proxy IPs from PROXIES.SX, you get a scraping stack that handles JavaScript-heavy SPAs, anti-bot detection, and high concurrency without breaking a sweat.
Event-Driven I/O
Non-blocking async handles thousands of concurrent proxy connections
Native Browser Control
Puppeteer and Playwright were built for Node.js first
JSON-Native
Parse API responses and structured data without serialization overhead
NPM Ecosystem
2M+ packages including scraping, parsing, and proxy utilities
JSON is the native data format of JavaScript, so parsing API responses and structured data is zero-friction. Combined with TypeScript for type safety on scraped data structures, Node.js gives you the fastest path from "idea" to "production scraper" of any runtime available today.
Setup: Proxies.sx Mobile Proxy Credentials
Before writing any code, you need your proxy credentials. Here is the quick path from signup to working proxy URL:
- 1
- 2Purchase bandwidth (starting at $6/GB) and ports ($25/port/month) from the pricing page
- 3Copy your credentials from the dashboard — username, password, gateway host, and port number
Your proxy URL follows this format across all examples in this guide:
// Proxy URL format for all Node.js examples
const PROXY_HOST = "gate.proxies.sx";
const PROXY_PORT = 10001;
const PROXY_USER = "your_username";
const PROXY_PASS = "your_password";
const PROXY_URL = `http://${PROXY_USER}:${PROXY_PASS}@${PROXY_HOST}:${PROXY_PORT}`;
// => http://your_username:your_password@gate.proxies.sx:10001
// For SOCKS5 (if needed)
const SOCKS5_URL = `socks5://${PROXY_USER}:${PROXY_PASS}@${PROXY_HOST}:${PROXY_PORT}`;Bandwidth is shared across all ports
You buy GB of bandwidth as a pool, and port subscriptions separately. Every port draws from the same bandwidth pool, so you only pay once for data regardless of how many ports you use.
Cheerio + Axios + Mobile Proxy
For static HTML pages and server-rendered content, Cheerio + Axios is the fastest option in Node.js. No browser overhead, no Chromium download—just raw HTTP requests with jQuery-like DOM parsing. This combination is ideal for scraping product listings, news articles, and any page that does not require JavaScript execution.
npm install axios cheerio https-proxy-agentHere is a complete working scraper with proxy agent configuration, Cheerio parsing, and proper error handling:
import axios, { AxiosError } from "axios";
import * as cheerio from "cheerio";
import { HttpsProxyAgent } from "https-proxy-agent";
// Proxy configuration
const PROXY_URL = "http://your_username:your_password@gate.proxies.sx:10001";
const proxyAgent = new HttpsProxyAgent(PROXY_URL);
// Create an Axios instance with proxy and realistic headers
const client = axios.create({
httpAgent: proxyAgent,
httpsAgent: proxyAgent,
timeout: 30_000,
headers: {
"User-Agent":
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
"AppleWebKit/605.1.15 (KHTML, like Gecko) " +
"Version/17.0 Mobile/15E148 Safari/604.1",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
});
interface Product {
name: string;
price: string;
url: string;
rating: string | null;
}
async function scrapeProducts(url: string): Promise<Product[]> {
try {
const { data: html, status } = await client.get<string>(url);
console.log(`Fetched ${url} - Status: ${status}, Length: ${html.length}`);
const $ = cheerio.load(html);
const products: Product[] = [];
$("div.product-card").each((_, el) => {
products.push({
name: $(el).find("h3.title").text().trim(),
price: $(el).find("span.price").text().trim(),
url: $(el).find("a").attr("href") ?? "",
rating: $(el).find("span.rating").text().trim() || null,
});
});
return products;
} catch (error) {
if (error instanceof AxiosError) {
if (error.code === "ECONNREFUSED") {
console.error("Proxy connection refused - check credentials");
} else if (error.response?.status === 403) {
console.error("403 Forbidden - IP may be blocked, rotate and retry");
} else if (error.response?.status === 429) {
console.error("429 Rate limited - back off and retry");
} else {
console.error(`HTTP error: ${error.response?.status ?? error.message}`);
}
}
throw error;
}
}
// Verify proxy IP first, then scrape
async function main() {
const { data } = await client.get("https://httpbin.org/ip");
console.log("Mobile proxy IP:", data.origin);
const products = await scrapeProducts("https://example.com/products");
console.log(`Scraped ${products.length} products`);
console.log(JSON.stringify(products, null, 2));
}
main().catch(console.error);Key points: the HttpsProxyAgent handles both HTTP and HTTPS traffic through your mobile proxy. Use a shared Axios instance for connection pooling, set a mobile User-Agent that matches your proxy IP type, and always include a timeout. Cheerio's jQuery-like API makes DOM traversal intuitive for anyone with frontend experience.
Puppeteer + Mobile Proxy
When sites render content with JavaScript or deploy fingerprinting, you need a real browser. Puppeteer drives headless Chrome with the --proxy-server flag to route all traffic through your mobile proxy. Combined with page.authenticate(), this creates a browser session that looks identical to a real mobile user.
npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealthHere is a full Puppeteer scraper with proxy authentication, stealth plugins, and screenshot verification:
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";
// Enable stealth mode to avoid detection
puppeteer.use(StealthPlugin());
const PROXY_HOST = "gate.proxies.sx";
const PROXY_PORT = 10001;
const PROXY_USER = "your_username";
const PROXY_PASS = "your_password";
interface ScrapeResult {
title: string;
products: Array<{ name: string; price: string; url: string }>;
screenshotPath: string;
status: "success" | "error";
error?: string;
}
async function scrapeWithPuppeteer(url: string): Promise<ScrapeResult> {
const browser = await puppeteer.launch({
headless: true,
args: [
`--proxy-server=http://${PROXY_HOST}:${PROXY_PORT}`,
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
"--disable-infobars",
"--no-sandbox",
"--disable-setuid-sandbox",
"--window-size=390,844",
],
});
const page = await browser.newPage();
try {
// Authenticate with the proxy
await page.authenticate({
username: PROXY_USER,
password: PROXY_PASS,
});
// Set mobile viewport and user agent
await page.setViewport({ width: 390, height: 844, isMobile: true });
await page.setUserAgent(
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
"AppleWebKit/605.1.15 (KHTML, like Gecko) " +
"Version/17.0 Mobile/15E148 Safari/604.1"
);
// Remove automation indicators
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, "webdriver", { get: () => undefined });
Object.defineProperty(navigator, "languages", { get: () => ["en-US", "en"] });
Object.defineProperty(navigator, "plugins", { get: () => [1, 2, 3, 4, 5] });
// @ts-expect-error - Chrome runtime mock
window.chrome = { runtime: {} };
});
// Navigate with realistic wait behavior
await page.goto(url, { waitUntil: "networkidle2", timeout: 60_000 });
// Wait for dynamic content
await page.waitForSelector(".product-card", { timeout: 10_000 }).catch(() => {
console.log("No .product-card found, continuing with page content");
});
const title = await page.title();
// Extract product data from the page
const products = await page.evaluate(() =>
Array.from(document.querySelectorAll(".product-card")).map((el) => ({
name: el.querySelector(".title")?.textContent?.trim() ?? "",
price: el.querySelector(".price")?.textContent?.trim() ?? "",
url: (el.querySelector("a") as HTMLAnchorElement)?.href ?? "",
}))
);
// Take a screenshot for verification
const screenshotPath = "puppeteer_result.png";
await page.screenshot({ path: screenshotPath, fullPage: true });
return { title, products, screenshotPath, status: "success" };
} catch (error) {
await page.screenshot({ path: "puppeteer_error.png" }).catch(() => {});
return {
title: "",
products: [],
screenshotPath: "puppeteer_error.png",
status: "error",
error: error instanceof Error ? error.message : String(error),
};
} finally {
await browser.close();
}
}
// Run the scraper
async function main() {
const result = await scrapeWithPuppeteer("https://example.com/products");
console.log(`Status: ${result.status}`);
console.log(`Title: ${result.title}`);
console.log(`Products found: ${result.products.length}`);
console.log(`Screenshot: ${result.screenshotPath}`);
}
main().catch(console.error);Anti-detection tip
The puppeteer-extra-plugin-stealth package patches over a dozen browser fingerprint leaks automatically. Combined with a real mobile proxy IP, your Puppeteer session is virtually indistinguishable from a real iPhone user browsing the web.
Playwright (Node.js) + Mobile Proxy
Playwright offers built-in proxy configuration at the browser level, cross-browser support (Chromium, Firefox, WebKit), and a more modern async API than Puppeteer. Its context isolation makes it ideal for running multiple scraping sessions with different proxy configurations simultaneously.
npm install playwrightimport { chromium, type Browser, type BrowserContext } from "playwright";
const PROXY_CONFIG = {
server: "http://gate.proxies.sx:10001",
username: "your_username",
password: "your_password",
};
interface ScrapedPage {
url: string;
title: string;
content: string;
links: string[];
status: "success" | "error";
}
async function scrapeWithPlaywright(urls: string[]): Promise<ScrapedPage[]> {
// Launch browser with proxy configuration
const browser: Browser = await chromium.launch({
headless: true,
proxy: PROXY_CONFIG,
});
// Create a context with mobile device emulation
const context: BrowserContext = await browser.newContext({
viewport: { width: 390, height: 844 },
userAgent:
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
"AppleWebKit/605.1.15 (KHTML, like Gecko) " +
"Version/17.0 Mobile/15E148 Safari/604.1",
locale: "en-US",
timezoneId: "America/New_York",
geolocation: { latitude: 40.7128, longitude: -74.006 },
permissions: ["geolocation"],
isMobile: true,
hasTouch: true,
});
// Remove automation fingerprints
await context.addInitScript(() => {
Object.defineProperty(navigator, "webdriver", { get: () => undefined });
Object.defineProperty(navigator, "languages", { get: () => ["en-US", "en"] });
Object.defineProperty(navigator, "plugins", { get: () => [1, 2, 3, 4, 5] });
});
const results: ScrapedPage[] = [];
for (const url of urls) {
const page = await context.newPage();
try {
await page.goto(url, { waitUntil: "networkidle", timeout: 60_000 });
// Wait for dynamic content
await page.waitForTimeout(1500);
const title = await page.title();
const content = await page.content();
// Extract all links on the page
const links = await page.evaluate(() =>
Array.from(document.querySelectorAll("a[href]"))
.map((a) => (a as HTMLAnchorElement).href)
.filter((href) => href.startsWith("http"))
);
results.push({ url, title, content, links, status: "success" });
console.log(`[OK] ${url} - ${title} (${links.length} links)`);
} catch (error) {
results.push({
url,
title: "",
content: "",
links: [],
status: "error",
});
console.error(`[FAIL] ${url} - ${error instanceof Error ? error.message : error}`);
} finally {
await page.close();
}
}
await context.close();
await browser.close();
return results;
}
// Scrape multiple pages through the mobile proxy
async function main() {
// Verify proxy IP first
const browser = await chromium.launch({ proxy: PROXY_CONFIG });
const page = await browser.newPage();
await page.goto("https://httpbin.org/ip");
const ip = await page.textContent("body");
console.log("Mobile proxy IP:", ip);
await browser.close();
// Scrape target pages
const results = await scrapeWithPlaywright([
"https://example.com/products?page=1",
"https://example.com/products?page=2",
"https://example.com/products?page=3",
]);
console.log(`\nScraped ${results.length} pages`);
console.log(`Success: ${results.filter((r) => r.status === "success").length}`);
console.log(`Failed: ${results.filter((r) => r.status === "error").length}`);
}
main().catch(console.error);Playwright vs Puppeteer
Playwright supports Chromium, Firefox, and WebKit from a single API. Its browser contexts provide true isolation—each context gets its own cookies, localStorage, and proxy config. This makes it easy to run multiple scraping sessions with different mobile proxy ports in parallel.
Crawlee + Mobile Proxy
Crawlee is the Node.js equivalent of Scrapy—a full-featured crawling framework with built-in proxy rotation, request queuing, auto-scaling, and storage. It wraps Puppeteer, Playwright, or plain HTTP under a unified API, letting you switch between headless browser and lightweight HTTP crawling without rewriting your scraper logic.
npm install crawlee playwrightimport {
PlaywrightCrawler,
ProxyConfiguration,
Dataset,
type PlaywrightCrawlingContext,
} from "crawlee";
// Configure mobile proxy pool with multiple PROXIES.SX ports
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
"http://your_username:your_password@gate.proxies.sx:10001",
"http://your_username:your_password@gate.proxies.sx:10002",
"http://your_username:your_password@gate.proxies.sx:10003",
],
});
// Create a Playwright-based crawler with mobile proxy
const crawler = new PlaywrightCrawler({
proxyConfiguration,
// Concurrency and rate limiting
maxConcurrency: 5,
maxRequestRetries: 3,
requestHandlerTimeoutSecs: 60,
// Launch options for stealth
launchContext: {
launchOptions: {
args: [
"--disable-blink-features=AutomationControlled",
"--disable-infobars",
],
},
},
// Browser context with mobile emulation
browserPoolOptions: {
useFingerprints: true,
fingerprintOptions: {
fingerprintGeneratorOptions: {
devices: ["mobile"],
operatingSystems: ["ios"],
},
},
},
// Main request handler
async requestHandler({ request, page, enqueueLinks, log }: PlaywrightCrawlingContext) {
log.info(`Scraping: ${request.url}`);
// Wait for page content to load
await page.waitForSelector(".product-card", { timeout: 10_000 }).catch(() => {
log.warning("No product cards found, parsing available content");
});
// Extract product data
const products = await page.evaluate(() =>
Array.from(document.querySelectorAll(".product-card")).map((el) => ({
name: el.querySelector(".title")?.textContent?.trim() ?? "",
price: el.querySelector(".price")?.textContent?.trim() ?? "",
url: (el.querySelector("a") as HTMLAnchorElement)?.href ?? "",
}))
);
// Store results in the default dataset
await Dataset.pushData({
url: request.url,
products,
scrapedAt: new Date().toISOString(),
});
log.info(`Found ${products.length} products on ${request.url}`);
// Follow pagination links automatically
await enqueueLinks({
selector: "a.next-page",
label: "PRODUCT_PAGE",
});
},
// Handle failures
async failedRequestHandler({ request, log }) {
log.error(`Request failed after retries: ${request.url}`);
await Dataset.pushData({
url: request.url,
error: request.errorMessages.join("; "),
scrapedAt: new Date().toISOString(),
});
},
});
// Start the crawler with seed URLs
async function main() {
await crawler.addRequests([
"https://example.com/products?page=1",
"https://example.com/products?page=2",
"https://example.com/products?page=3",
]);
await crawler.run();
// Export results
const dataset = await Dataset.open();
const { items } = await dataset.getData();
console.log(`\nCrawl complete. ${items.length} pages processed.`);
}
main().catch(console.error);Crawlee's ProxyConfiguration automatically rotates through your PROXIES.SX proxy URLs, distributing requests across ports. The built-inRequestQueue deduplicates URLs, and Dataset provides structured storage with JSON, CSV, and Excel export. For teams scaling beyond simple scripts, Crawlee removes the need to build your own scheduling and retry infrastructure.
IP Rotation & Session Management
IP rotation is critical for high-volume scraping. PROXIES.SX supports two rotation methods, and choosing the right one depends on your scraping pattern.
Rotating Sessions
Each request gets a different IP. Configure auto-rotation in your dashboard so the IP changes every N minutes with zero code changes.
- Best for scraping search results
- Minimizes IP-based rate limiting
- No session state needed
Sticky Sessions
Same IP persists for a set duration. Essential when you need to maintain login state or navigate multi-page flows through the proxy.
- Best for authenticated scraping
- Maintains cookies and sessions
- Multi-page navigation flows
Here is how to implement on-demand rotation and connection pooling in Node.js. Call this between request batches or after encountering blocks:
import axios from "axios";
import { HttpsProxyAgent } from "https-proxy-agent";
interface ProxyPort {
url: string;
agent: HttpsProxyAgent<string>;
lastUsed: number;
requestCount: number;
}
class ProxyPool {
private ports: ProxyPort[];
private apiKey: string;
private apiBase = "https://client.proxies.sx/api";
private roundRobinIndex = 0;
constructor(proxyUrls: string[], apiKey: string) {
this.apiKey = apiKey;
this.ports = proxyUrls.map((url) => ({
url,
agent: new HttpsProxyAgent(url),
lastUsed: 0,
requestCount: 0,
}));
}
/** Get the next proxy agent using round-robin selection */
getNext(): { agent: HttpsProxyAgent<string>; portIndex: number } {
const port = this.ports[this.roundRobinIndex];
port.lastUsed = Date.now();
port.requestCount++;
const portIndex = this.roundRobinIndex;
this.roundRobinIndex = (this.roundRobinIndex + 1) % this.ports.length;
return { agent: port.agent, portIndex };
}
/** Force IP rotation on a specific port */
async rotateIp(portId: string): Promise<boolean> {
try {
await axios.post(
`${this.apiBase}/ports/${portId}/rotate`,
{},
{
headers: { Authorization: `Bearer ${this.apiKey}` },
timeout: 10_000,
}
);
console.log(`IP rotated on port ${portId} at ${new Date().toISOString()}`);
return true;
} catch (error) {
console.error(`Rotation failed on port ${portId}:`, error);
return false;
}
}
/** Get current IP for a specific port */
async getCurrentIp(portIndex: number): Promise<string> {
try {
const { data } = await axios.get("https://httpbin.org/ip", {
httpAgent: this.ports[portIndex].agent,
httpsAgent: this.ports[portIndex].agent,
timeout: 15_000,
});
return data.origin;
} catch {
return "unknown";
}
}
/** Get pool statistics */
getStats() {
return this.ports.map((p, i) => ({
index: i,
requestCount: p.requestCount,
lastUsed: new Date(p.lastUsed).toISOString(),
}));
}
}
// Usage
const pool = new ProxyPool(
[
"http://user:pass@gate.proxies.sx:10001",
"http://user:pass@gate.proxies.sx:10002",
"http://user:pass@gate.proxies.sx:10003",
],
"your_api_key"
);
async function scrapeWithPool(urls: string[]) {
for (const url of urls) {
const { agent, portIndex } = pool.getNext();
const { data } = await axios.get(url, {
httpAgent: agent,
httpsAgent: agent,
timeout: 30_000,
});
console.log(`[${portIndex}] ${url} -> ${data.length} bytes`);
}
console.log("Pool stats:", pool.getStats());
}
scrapeWithPool(["https://example.com/1", "https://example.com/2"]);Error Handling & Production Patterns
Production scrapers encounter 403s, 429s, timeouts, and connection resets. The difference between a weekend project and a production system is how you handle these failures. Here is a battle-tested retry utility with exponential backoff, jitter, and automatic IP rotation on persistent failures:
import axios, { AxiosError, type AxiosInstance, type AxiosResponse } from "axios";
import { HttpsProxyAgent } from "https-proxy-agent";
interface RetryConfig {
maxRetries: number;
baseDelay: number;
maxDelay: number;
rotateOnBlock: boolean;
onRotate?: () => Promise<void>;
}
const DEFAULT_RETRY_CONFIG: RetryConfig = {
maxRetries: 5,
baseDelay: 1000,
maxDelay: 60_000,
rotateOnBlock: true,
};
/** Sleep with optional jitter */
function sleep(ms: number, jitter = true): Promise<void> {
const actual = jitter ? ms + Math.random() * ms * 0.5 : ms;
return new Promise((resolve) => setTimeout(resolve, actual));
}
/** Retry wrapper with exponential backoff and proxy rotation */
async function fetchWithRetry(
client: AxiosInstance,
url: string,
config: Partial<RetryConfig> = {}
): Promise<AxiosResponse> {
const opts = { ...DEFAULT_RETRY_CONFIG, ...config };
let lastError: Error | null = null;
for (let attempt = 0; attempt < opts.maxRetries; attempt++) {
try {
const response = await client.get(url);
// Success
if (response.status === 200) return response;
// Rate limited
if (response.status === 429) {
const retryAfter = parseInt(response.headers["retry-after"] ?? "30", 10);
console.warn(`[429] Rate limited on attempt ${attempt + 1}. Waiting ${retryAfter}s`);
await sleep(retryAfter * 1000, false);
continue;
}
// Forbidden - likely IP blocked
if (response.status === 403) {
console.warn(`[403] Blocked on attempt ${attempt + 1}`);
if (opts.rotateOnBlock && opts.onRotate) {
await opts.onRotate();
await sleep(5000);
}
continue;
}
// Server errors
if (response.status >= 500) {
const delay = Math.min(opts.baseDelay * 2 ** attempt, opts.maxDelay);
console.warn(`[${response.status}] Server error. Retrying in ${delay}ms`);
await sleep(delay);
continue;
}
return response;
} catch (error) {
lastError = error instanceof Error ? error : new Error(String(error));
if (error instanceof AxiosError) {
if (error.code === "ECONNRESET" || error.code === "ETIMEDOUT") {
const delay = Math.min(opts.baseDelay * 2 ** attempt, opts.maxDelay);
console.warn(`[${error.code}] Connection error. Retrying in ${delay}ms`);
await sleep(delay);
continue;
}
}
// Unknown error - backoff and retry
const delay = Math.min(opts.baseDelay * 2 ** attempt, opts.maxDelay);
console.error(`[ERROR] Attempt ${attempt + 1}: ${lastError.message}. Retrying in ${delay}ms`);
await sleep(delay);
}
}
throw new Error(`All ${opts.maxRetries} retries exhausted. Last error: ${lastError?.message}`);
}
/** Proxy health check utility */
async function checkProxyHealth(proxyUrl: string): Promise<{
healthy: boolean;
ip: string;
latencyMs: number;
}> {
const agent = new HttpsProxyAgent(proxyUrl);
const start = Date.now();
try {
const { data } = await axios.get("https://httpbin.org/ip", {
httpAgent: agent,
httpsAgent: agent,
timeout: 15_000,
});
return {
healthy: true,
ip: data.origin,
latencyMs: Date.now() - start,
};
} catch {
return {
healthy: false,
ip: "unknown",
latencyMs: Date.now() - start,
};
}
}
// Usage example
async function main() {
const proxyUrl = "http://user:pass@gate.proxies.sx:10001";
const agent = new HttpsProxyAgent(proxyUrl);
// Health check first
const health = await checkProxyHealth(proxyUrl);
console.log("Proxy health:", health);
if (!health.healthy) {
console.error("Proxy is not responding. Check credentials.");
process.exit(1);
}
// Create client with proxy
const client = axios.create({
httpAgent: agent,
httpsAgent: agent,
timeout: 30_000,
headers: {
"User-Agent":
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) " +
"AppleWebKit/605.1.15 (KHTML, like Gecko) " +
"Version/17.0 Mobile/15E148 Safari/604.1",
},
});
// Scrape with automatic retries and backoff
const response = await fetchWithRetry(client, "https://example.com/data", {
maxRetries: 5,
baseDelay: 2000,
rotateOnBlock: true,
onRotate: async () => {
console.log("Rotating IP via API...");
// Call PROXIES.SX rotation API here
},
});
console.log(`Success: ${response.status}, ${response.data.length} bytes`);
}
main().catch(console.error);Why jitter matters
Without jitter, all your retry attempts across concurrent workers hit the server at the exact same time (the "thundering herd" problem). Adding randomized jitter spreads retries across a window, dramatically improving success rates. The sleep function above adds 0-50% random jitter by default.
Frequently Asked Questions
What is the best Node.js library for web scraping with mobile proxies?
It depends on your target site. Use Cheerio + Axios for static HTML pages where speed and low resource usage matter - no browser overhead, just raw HTTP. Use Puppeteer when you need JavaScript rendering and Google Chrome compatibility. Use Playwright for cross-browser support and advanced context isolation. Use Crawlee for large-scale crawling with built-in proxy rotation, retry logic, and data storage. All four integrate seamlessly with PROXIES.SX mobile proxies.
How do I configure a mobile proxy in Puppeteer?
Launch Puppeteer with the --proxy-server argument: puppeteer.launch({ args: ["--proxy-server=http://gate.proxies.sx:10001"] }). Then call page.authenticate({ username: "user", password: "pass" }) before navigating. All browser traffic automatically routes through the mobile proxy. Add puppeteer-extra-plugin-stealth for anti-detection.
How do I use a proxy with Axios in Node.js?
Install the https-proxy-agent package. Create an agent: const agent = new HttpsProxyAgent("http://user:pass@gate.proxies.sx:10001"). Pass it as both httpAgent and httpsAgent in your Axios config. For persistent connections, create a shared Axios instance with axios.create({ httpAgent: agent, httpsAgent: agent }).
Can I use Playwright with mobile proxies in Node.js?
Yes. Playwright has built-in proxy support. Pass the proxy configuration when launching: chromium.launch({ proxy: { server: "http://gate.proxies.sx:10001", username: "user", password: "pass" } }). Playwright supports HTTP, HTTPS, and SOCKS5 protocols. Each browser context can have its own proxy configuration for parallel scraping with different IPs.
How do I rotate IPs in Node.js scrapers?
PROXIES.SX supports two methods. Auto-rotation changes your IP on a timer configured in the dashboard with zero code changes. On-demand rotation uses a POST request to the rotation API endpoint to force immediate IP changes. In Node.js, use fetch or axios to call the endpoint between request batches or after encountering 403/429 errors.
What is Crawlee and why should I use it?
Crawlee is a full-featured web scraping framework for Node.js built by the team at Apify. It provides ProxyConfiguration for automatic proxy pool management, RequestQueue for URL deduplication and scheduling, Dataset for structured data storage, and built-in exponential backoff. It wraps Puppeteer, Playwright, or plain HTTP, so you can switch between browser and lightweight scraping without rewriting code.
How do I handle errors and retries in Node.js scrapers?
Implement exponential backoff with jitter using async/await. On 403, rotate your IP via the PROXIES.SX API and retry after 5 seconds. On 429, read the Retry-After header and wait accordingly. For connection errors (ECONNRESET, ETIMEDOUT), use increasing delays starting at 1-2 seconds. Wrap your scraping logic in a reusable retry function with configurable max retries and delay multiplier.
Is TypeScript recommended for Node.js web scraping?
Strongly recommended. TypeScript provides type safety for scraped data structures, ensuring your parser correctly handles missing fields and edge cases. All major scraping libraries (Puppeteer, Playwright, Cheerio, Crawlee) ship with TypeScript types. The compiler catches null reference errors, incorrect property access, and type mismatches before your scraper runs, saving hours of debugging.
Start Scraping with Mobile Proxies Today
Get 1GB free trial bandwidth + 2 ports. Every code example in this guide works out of the box with PROXIES.SX credentials. Setup takes less than 60 seconds.