Providers
justcrawl.io routes scraping requests across multiple providers. You bring your own API keys.
Supported providers
Section titled “Supported providers”| Provider | Auth | Auto-route by URL | Async webhook | Vendor docs |
|---|---|---|---|---|
| Bright Data | API Key + Zone | No (single zone per account) | No | docs.brightdata.com |
| Oxylabs | Username + Password | Yes (33 verified source templates) | No (TODO 2) | developers.oxylabs.io |
| Nimble Way | Bearer token | No (driver-based stealth) | No | docs.nimbleway.com |
| Zyte | API Key | No (auto-extract is TODO 3) | No | docs.zyte.com |
| Decodo | API Key (formerly Smartproxy) | Yes (17 verified target templates) | No (TODO 2) | help.decodo.com |
Connecting a provider
Section titled “Connecting a provider”- Go to Settings > Provider Accounts
- Click Add Provider
- Enter your credentials
- Click Test Connection to verify
- Save
Credentials are encrypted at rest with AES-256-GCM and never returned in API responses.
Provider pricing
Section titled “Provider pricing”In Settings > Vendor Pricing, enter what you pay per 1,000 requests for each provider. This enables:
- Cost strategy: ranks providers by cheapest-first
- Cost estimates: shows expected cost per workflow on the dashboard
- Premium domains: optionally set higher rates for specific domains (e.g., social media sites)
Multiple providers required
Section titled “Multiple providers required”Smart optimization (benchmarking, strategy selection, auto-tuning) requires at least 2 connected providers. With a single provider, justcrawl acts as a simple proxy with scheduling and URL management.
Per-node settings
Section titled “Per-node settings”Every service node in a workflow accepts a shared set of settings (Country, JS rendering, Wait selector, Parse, Screenshot, Markdown, Custom Headers, Custom Cookies, Sessions, Device, Advanced JSON) plus vendor-specific extras. See Service node settings for the full reference.
The capability matrix below shows which providers support which features. Cells marked ✓ are supported and exposed in the workflow editor.
| Feature | Bright Data | Oxylabs | Nimble Way | Zyte | Decodo |
|---|---|---|---|---|---|
| JS rendering | ✓ (always) | ✓ | ✓ | ✓ | ✓ |
| Geo-targeting | ✓ | ✓ | ✓ | ✓ | ✓ |
| Wait for selector | ✓ | ✓ | ✓ | ✓ | ✓ |
| Parse (vendor-side structured) | — | ✓ | — | TODO 3 | ✓ |
| Screenshot (PNG) | ✓ | ✓ | ✓ | ✓ | ✓ |
| Markdown output | ✓ | — | ✓ | — | ✓ |
| Custom headers | ✓ | ✓ | ✓ | ✓ | ✓ |
| Custom cookies | — | ✓ | ✓ | ✓ | ✓ |
| Sessions (sticky IP) | — | ✓ | — | ✓ | ✓ |
| Device emulation | — | ✓ | ✓ | ✓ | ✓ |
| Auto-route by URL | — | ✓ | — | — | ✓ |
| Advanced (JSON) passthrough | ✓ | ✓ | ✓ | ✓ | ✓ |
Bright Data
Section titled “Bright Data”Bright Data Web Unlocker is a synchronous proxy service that automatically renders JavaScript and handles anti-bot detection. The adapter targets the Web Unlocker product specifically. Async datasets (Web Scraper API) and SERP API are out of scope (TODO 1).
Required configuration
Section titled “Required configuration”- API Key — sent as
Authorization: Bearer <key>. - Zone — the Web Unlocker zone name from your Bright Data dashboard (e.g.,
web_unlocker1). Must be Web Unlocker type, not Web Scraper, SERP, or Browser. Auto-detected from your account at connection time; pick from the dropdown.
Capabilities
Section titled “Capabilities”| Feature | Notes |
|---|---|
| JS rendering | Always on. Web Unlocker can’t be disabled. |
| Geo-targeting | Lowercase ISO-2 (us, gb, de). Passed as-is. |
| Wait for selector | Encoded in x-unblock-expect header. Requires “Manual expect elements” enabled on zone. |
| Screenshot | Set screenshot: true → data_format: "screenshot". Body is base64 PNG, contentType: image/png. |
| Markdown | Set markdown: true → data_format: "markdown". Body is markdown text. Screenshot wins if both set. |
| Custom headers | Merged into request body headers field (preserves x-unblock-expect). Requires “Custom Headers & Cookies” enabled on zone. |
| Sessions / cookies / device emulation | Not exposed by Web Unlocker. |
Response format
Section titled “Response format”format: "json"(default):{status_code, body, headers}envelope.bodycontains the rendered HTML or base64 PNG / markdown if adata_formatis set.format: "raw": response is the body directly, no envelope.
Gotchas
Section titled “Gotchas”Zone-type mismatch. If your zone is not Web Unlocker, the response shape is different and
bodyis missing. The adapter detects this and returnserrorType: 'validation'with the messageBright Data response missing 'body' field — likely zone-type mismatch (zone must be Web Unlocker, not Web Scraper / SERP / Browser).
Premium domains. Walmart, Target, Instacart, and similar require a separate premium zone in your Bright Data account. Standard zones fail.
Feature gates.
customHeadersandwaitForSelectoronly work if the corresponding feature is enabled on the zone in the Bright Data UI. Without the feature, the field is silently ignored.
Vendor docs
Section titled “Vendor docs”- Web Unlocker overview
- Configuration
- Quickstart
- Features
- API reference
- SERP API (separate product, not wired)
Zone routing
Section titled “Zone routing”When you save Bright Data credentials, ScrapeRoute calls
GET /zone/get_active_zones and stores the zones (with their type) in
org_provider_resources. A periodic job (sync_provider_resources, every 15
min) keeps that table in sync with the dashboard. At scrape time, the resolver
picks a zone in this order:
- Per-node
zoneoverride (workflow editor → service node config). Non-empty value wins over everything below. - Host detector — known SERP hosts route through your SERP-typed zone:
google.*,bing.com,duckduckgo.com,yandex.*, andyahoo.com/search. If your account doesn’t have a SERP zone, the resolver falls through. credentials.zone— your account’s default Web Unlocker zone.
The workflow editor’s zone dropdown includes a Use account default option
(empty value) that lets a node opt out of an explicit pick and rely on the
detector + credentials default.
What’s not supported
Section titled “What’s not supported”- Web Scraper API async (datasets/v3/trigger with snapshot polling) — TODO 1.
Oxylabs
Section titled “Oxylabs”Oxylabs Realtime Web Scraper API. The adapter exposes 33 verified source templates (amazon_product, google_search, walmart, ebay, etsy, kroger, target, bestbuy, …) — every entry verified against developers.oxylabs.io — plus a universal fallback for any URL.
Auto-routing by URL (headline feature)
Section titled “Auto-routing by URL (headline feature)”Leave the Source field empty and the adapter auto-detects the right Oxylabs source and parameter shape from your URL:
| URL | Routes to | With params |
|---|---|---|
amazon.com/dp/B08Y72CH1F | amazon_product | {query: "B08Y72CH1F", domain: "com"} |
amazon.co.uk/dp/B0XXXX | amazon_product | {query: "B0XXXX", domain: "co.uk"} |
amazon.com/s?k=laptop | amazon_search | {query: "laptop", domain: "com"} |
walmart.com/ip/widget/123456 | walmart_product | {product_id: "123456", domain: "com"} |
walmart.com/search?q=tv | walmart_search | {query: "tv"} |
google.com/search?q=test | google_search | {query: "test", domain: "com"} |
ebay.com/itm/... | ebay | {url} |
| any other host | universal | {url} |
International TLDs are preserved (amazon.com.br → domain: "com.br"). Detector source: packages/providers/src/oxylabs-source-detector.ts.
To override, set Source explicitly in the node config — that wins over auto-detection. See the legacy-row guard below if you do.
Required configuration
Section titled “Required configuration”- Username + Password — HTTP Basic auth.
Capabilities
Section titled “Capabilities”| Feature | Notes |
|---|---|
| JS rendering | renderJs: true → render: "html". |
| Geo-targeting | ISO-2 lowercase converted to full country name (us → "United States"). |
| Wait for selector | browser_instructions[0] with CSS wait. Requires JS rendering. Default timeout 10s. |
| Parse | parse: true → Oxylabs returns structured JSON, contentType: application/json. |
| Screenshot | screenshot: true → render: "png". Base64 PNG. Wins over renderJs. |
| Custom headers | Forwarded via context: [{key: "headers", value: {...}}]. |
| Custom cookies | Flat Record<string,string> → context: [{key: "cookies", value: {...}}]. |
| Sessions | sessionId → context: [{key: "session_id", value: ...}]. |
| Device | desktop / mobile / tablet → user_agent_type (tablet maps to mobile). |
| HTTP POST | httpMethod: "POST" + httpBody → context: [{key: "http_method", value: "post"}, {key: "content", value: "<base64>"}]. |
Legacy-row guard
Section titled “Legacy-row guard”Oxylabs’s per-source identifier-field contract is heterogeneous, not uniform. Sending the wrong field name returns HTTP 400 with a vendor message like [product_id]: This field is missing. [query]: This field was not expected.
| Identifier field | Sources | Param shape |
|---|---|---|
url | universal, amazon, walmart, ebay, google, … | {url} |
query (+ optional domain) | amazon_product, amazon_search, walmart_search, google_search | {query, domain?} |
product_id (+ optional domain) | walmart_product | {product_id, domain?} |
If you set Source explicitly without supplying the matching field in sourceParams, the adapter returns:
errorType: 'validation'error: "Oxylabs source 'walmart_product' requires 'product_id' in sourceParams. See https://developers.oxylabs.io/scraping-solutions/web-scraper-api/targets/walmart/product. Or clear the source field to use auto-detection from the URL."Workaround: clear Source to use auto-detection, or populate sourceParams with the field that matches your chosen source ({query, domain} for search-style, {product_id, domain} for product-style) in Advanced (JSON).
Gotchas
Section titled “Gotchas”Render timeout cap. The adapter’s
maxTimeoutMsis 30s; Oxylabs’s upstream cap is 180s. Use Advanced (JSON) to extend (e.g.,{"timeout": 120000}) up to the upstream cap.
Parsed = JSON, not HTML. When
parse: true, the body isJSON.stringify-ed parsed data withcontentType: application/json. Downstream HTML validators skip this content type automatically.
Per-source identifier reference
Section titled “Per-source identifier reference”Oxylabs’s per-source contract is heterogeneous: URL-based sources take url, search-style sources take query (+ optional domain), product-style sources take product_id (+ optional domain). Sending the wrong field name fails with HTTP 400 and a vendor message like [product_id]: This field is missing. [query]: This field was not expected. The detector picks the right field for you when you leave Source on Auto-detect from URL; the table below is the contract for explicit-source workflows.
Identifier-field legend:
url— pass the full URL insourceParams.url(or leave the URL field unchanged for auto).query— pass a search term insourceParams.query.product_id— pass the vendor’s product id (e.g., Walmart item id, Amazon ASIN) insourceParams.product_id.
Generated from
SOURCE_REQUIRED_PARAMinpackages/providers/src/oxylabs-source-detector.ts. Runpnpm exec tsx scripts/generate-vendor-doc-tables.tsto refresh.
| Source | Group | Identifier field | Vendor docs |
|---|---|---|---|
universal | General | url | link |
amazon | Amazon | url | link |
amazon_product | Amazon | query | link |
amazon_search | Amazon | query | link |
amazon_pricing | Amazon | query | link |
amazon_bestsellers | Amazon | query | link |
amazon_sellers | Amazon | query | link |
google | url | link | |
google_search | query | link | |
google_ads | query | link | |
google_travel_hotels | query | link | |
google_lens | query | link | |
google_shopping_search | query | link | |
google_shopping_product | query | link | |
google_trends_explore | query | link | |
bing | Bing | url | link |
bing_search | Bing | query | link |
walmart | Walmart | url | link |
walmart_product | Walmart | product_id | link |
walmart_search | Walmart | query | link |
ebay | eBay | url | link |
ebay_search | eBay | query | link |
etsy | E-commerce | url | link |
kroger | E-commerce | url | link |
target | E-commerce | url | link |
bestbuy | E-commerce | url | link |
lowes | E-commerce | url | link |
costco | E-commerce | url | link |
aliexpress | E-commerce | url | link |
alibaba | E-commerce | url | link |
rakuten | E-commerce | url | link |
flipkart | E-commerce | url | link |
mercadolibre | E-commerce | url | link |
Vendor docs
Section titled “Vendor docs”- Web Scraper API overview
- Targets index (full source list)
- Custom Parser (use Advanced (JSON) with
parsing_instructions) - JavaScript rendering
- Integration methods (Realtime / Proxy / Push-Pull)
- Forming requests
What’s not supported
Section titled “What’s not supported”- Push-Pull async webhook — TODO 2. Realtime endpoint only today.
parsing_instructionsUI — available via Advanced (JSON) only.
Nimble Way
Section titled “Nimble Way”Nimble Way Web API. URL-uniform — there is no auto-routing by URL. Site-aware behavior comes from explicit driver selection (Nimble’s stealth profiles).
Required configuration
Section titled “Required configuration”- API Token — sent as
Authorization: Bearer <token>.
Driver-based stealth
Section titled “Driver-based stealth”Nimble doesn’t have site-specific templates. Instead, drivers tune scraping behavior:
| Driver | Use case |
|---|---|
vx6 | Fast, no JavaScript — plain HTML only |
vx8 | Headless + JS rendering (default when renderJs: true or waitForSelector set) |
vx8-pro | Enhanced headless variant |
vx10 | Full-page rendering variant |
vx10-pro | Stealth headful — auto-selected from device: 'mobile' or device: 'tablet' |
Auto (default) | Inferred per request: device hint → JS flag → vx6 fallback |
When Driver is empty, the adapter infers:
device: 'mobile' | 'tablet'→vx10-pro(with render enabled)renderJs: trueorwaitForSelector→vx8- otherwise → backend default
Capabilities
Section titled “Capabilities”| Feature | Notes |
|---|---|
| JS rendering | render: true + driver. Auto-enabled with renderJs, waitForSelector, or device: 'mobile'/'tablet'. |
| Geo-targeting | Uppercase ISO-2 (US, GB, FR). Sub-country via Advanced (JSON). |
| Wait for selector | browser_actions array with timeout (default 10s). Forces vx8. |
| Screenshot | formats: ['html', 'screenshot']. Base64 PNG, contentType: image/png. |
| Markdown | formats: ['html', 'markdown']. contentType: text/markdown. |
| Custom headers | Pass-through object → body.headers. |
| Custom cookies | Flat map → array of {key, value, domain}. Each cookie’s domain defaults to the request URL hostname if not specified. |
| Device | desktop / mobile / tablet → driver mapping above. |
Gotchas
Section titled “Gotchas”
device: 'desktop'alone is a no-op. Selecting Desktop without also enablingrenderJsorwaitForSelectordoes not trigger rendering — Nimble requiresrender: trueto be paired with a driver. Combine withRender JSor a wait selector. Fixed in v0.3.4.
Auto driver fix in v0.3.4. Empty driver string previously short-circuited inference. Now
device: 'mobile'still resolves tovx10-proandrenderJs: truestill falls back tovx8when Driver = Auto.
Cookies must include domain. The adapter auto-fills it from the request URL hostname; only override via Advanced (JSON) if you need a different domain (e.g.,
.example.com).
Vendor docs
Section titled “Vendor docs”What’s not supported
Section titled “What’s not supported”- Endpoint migration — adapter currently uses
sdk.nimbleway.com/v1/extract. The canonical endpoint per docs isapi.webit.live/api/v1/realtime/web. Migration is on the roadmap; both currently route the same. - First-class sessions — no session ID; identity is per-request.
parsingschema UI — available via Advanced (JSON) with a CSS-selector schema.
Zyte API at /v1/extract. Two execution modes — browser and HTTP — with a strict mutual-exclusion guard at the adapter boundary.
Required configuration
Section titled “Required configuration”- API Key — HTTP Basic with key as username, empty password.
Two execution modes
Section titled “Two execution modes”Browser mode — full headless Chrome. Triggered when you set renderJs, screenshot, or waitForSelector.
- Sends
browserHtml: trueto Zyte; supports actions array (waitForSelector with timeout). - For headers: only
Refereris honored on browser mode (other custom headers are silently dropped server-side).
HTTP mode — pure fetch. Default when no browser flag is set.
- Sends
httpResponseBody: trueto Zyte. Adapter base64-decodes the body to UTF-8. - Supports POST body and full custom-headers passthrough.
- Required for
httpMethod: "POST"and device emulation.
Mutual exclusion. Combining browser flags (
renderJs,screenshot,waitForSelector) with HTTP-mode options (httpMethod: "POST"orhttpBody) returnserrorType: 'validation'before reaching Zyte’s API. Pick one mode.
Capabilities
Section titled “Capabilities”| Feature | Notes |
|---|---|
| JS rendering | Browser mode (browserHtml: true). |
| Geo-targeting | Uppercase ISO-2 → geolocation. 20 countries free; 50+ extended tier costs extra. |
| Wait for selector | actions[0] with timeout — adapter converts ms → seconds. |
| Screenshot | screenshot: true → base64 PNG. Adds extra cost. Browser mode only. |
| Custom headers | HTTP mode: customHttpRequestHeaders (full passthrough). Browser mode: requestHeaders (only Referer honored). |
| Custom cookies | requestCookies array. Each cookie’s domain defaults to the request URL hostname per eng-review A2. |
| Sessions | session.id (RFC 4122 v4 UUID). Free-form strings are deterministically hashed to a stable v4 UUID. |
| Device | HTTP mode only. tablet maps to mobile. Silently omitted in browser mode. |
| HTTP POST | HTTP mode only. UTF-8 body via httpRequestText. |
ipType | Vendor-specific node field: datacenter (default, free) or residential (KYC + per-request surcharge). |
Gotchas
Section titled “Gotchas”Selector timeout unit. justcrawl uses milliseconds; Zyte uses seconds. The adapter converts (
Math.round(ms / 1000)). Default 10000 ms →timeout: 10seconds.
Screenshot is independent of
renderJs. Settingscreenshot: truetriggers browser mode on its own without a redundantbrowserHtml: true. Set both if you want HTML and a screenshot in the same request.
Residential IPs require KYC. Zyte requires identity verification before allowing residential routing. Per-request surcharge applies. Defaults to datacenter.
Sessions UUID coercion. Zyte requires
session.idto be a v4 UUID. The adapter SHA-1-hashes any free-form string (e.g.,"user-123-cart") to a deterministic v4 UUID — same input always produces the same Zyte session.
Vendor docs
Section titled “Vendor docs”- API reference (top-level params, mutual exclusion)
- Browser mode (actions, viewport, device)
- HTTP mode
- Auto-extraction (product / article / jobPosting — TODO 3)
- Features (geolocation, ipType, sessions, cookies)
- Proxy mode
What’s not supported
Section titled “What’s not supported”- Auto-extraction routing (
product: true,article: true,jobPosting: true) — TODO 3. The flags can be set via Advanced (JSON), but the response routing for structured JSON is not wired into the validation pipeline yet. Use only if you control the downstream consumer.
Decodo
Section titled “Decodo”Decodo Web Scraping API real-time mode (formerly Smartproxy). 17 verified target templates (every entry checked against help.decodo.com’s llms.txt index) with auto-routing similar to Oxylabs. URLs that don’t match a verified target route through universal.
Auto-routing by URL
Section titled “Auto-routing by URL”Leave Target empty for automatic routing. Same shape as Oxylabs:
| URL | Routes to | With params |
|---|---|---|
amazon.com/dp/B08Y72CH1F | amazon_product | {query: "B08Y72CH1F", domain: "com"} |
amazon.com/s?k=laptop | amazon_search | {query: "laptop", domain: "com"} |
| amazon other paths | amazon | {url} |
google.com/search?q=test | google_search | {query: "test", domain: "com"} |
| google other paths | universal | {url} |
walmart.com/ip/widget/12345 | walmart_product | {product_id: "12345"} |
walmart.com/search?q=tv | walmart_search | {query: "tv"} |
| walmart other | walmart | {url} |
ebay.* | universal | {url} (Decodo ships no per-target eBay doc page; URL passthrough) |
| any other host | universal | {url} |
Detector source: packages/providers/src/decodo-target-detector.ts. Explicit Target wins over auto-detection.
Required configuration
Section titled “Required configuration”- API Key (token) — sent as HTTP Basic username with empty password.
Capabilities
Section titled “Capabilities”| Feature | Notes |
|---|---|
| JS rendering | renderJs: true → headless: "html". |
| Geo-targeting | ISO-2 → Title Case country name (e.g., us → "United States"). |
| Wait for selector | browser_actions[0] with default 10s timeout. |
| Parse | parse: true → vendor-side parsed JSON (e.g., product details), contentType: application/json. |
| Screenshot | screenshot: true → headless: "png". Base64 PNG. Wins over markdown / renderJs. |
| Markdown | markdown: true → markdown text + auto-enables headless rendering. |
| Custom headers | body.headers object passthrough. |
| Custom cookies | body.cookies object passthrough. |
| Sessions | session_id for sticky sessions. |
| Device | device_type: desktop / mobile / tablet. |
| HTTP POST | http_method: "POST" + payload (base64). |
Decodo-specific node fields
Section titled “Decodo-specific node fields”| Field | Notes |
|---|---|
| Target | Explicit override — 17 verified options grouped by site. Empty = auto-detect. |
| Proxy Pool | standard (default) or premium. |
| Page From / Page Count | Only meaningful for paginated targets like google_search, amazon_search. |
Legacy-row guard
Section titled “Legacy-row guard”Decodo mirrors Oxylabs’s heterogeneous identifier-field contract. Sending the wrong field name returns HTTP 400.
| Identifier field | Targets | Param shape |
|---|---|---|
url | universal, amazon, walmart, google | {url} |
query (+ optional domain) | amazon_product, amazon_search, walmart_search, google_search | {query, domain?} |
product_id | walmart_product | {product_id} |
If you set Target explicitly without supplying the matching field in targetParams:
errorType: 'validation'error: "Decodo target 'walmart_product' requires 'product_id' in targetParams. See https://help.decodo.com/docs/web-scraping-api-walmart-product. Or clear the target field to use auto-detection from the URL."Gotchas
Section titled “Gotchas”150s server timeout. Decodo’s real-time hard cap is 150s. The adapter uses
maxTimeoutMs: 30000as a conservative client guard.
Screenshot wins over markdown. If you set both, you get the PNG.
Markdown forces headless. The adapter auto-enables
headless: "html"whenmarkdown: true.
Per-target identifier reference
Section titled “Per-target identifier reference”Decodo’s target contract mirrors Oxylabs’s: URL-based targets take url, search-style take query (+ optional domain), product-style take product_id (+ optional domain). The detector picks the right field for you when you leave Target on Auto-detect from URL; the table below is the contract for explicit-target workflows. Decodo’s documented surface is narrower than Oxylabs’s — only targets with a verified vendor doc page are listed; URLs that don’t match any of these go through universal.
Generated from
TARGET_REQUIRED_PARAMinpackages/providers/src/decodo-target-detector.ts. Runpnpm exec tsx scripts/generate-vendor-doc-tables.tsto refresh.
| Target | Group | Identifier field | Vendor docs |
|---|---|---|---|
universal | General | url | link |
amazon | Amazon | url | link |
amazon_product | Amazon | query | link |
amazon_search | Amazon | query | link |
amazon_pricing | Amazon | query | link |
amazon_bestsellers | Amazon | query | link |
amazon_sellers | Amazon | query | link |
google | url | link | |
google_search | query | link | |
google_ads | query | link | |
google_travel_hotels | query | link | |
google_shopping_search | query | link | |
google_shopping_product | query | link | |
bing_search | Bing | query | link |
walmart | Walmart | url | link |
walmart_product | Walmart | product_id | link |
walmart_search | Walmart | query | link |
Vendor docs
Section titled “Vendor docs”- Web Scraping API introduction
- Quick start
- Real-time requests
- Asynchronous requests (v3/task) — not wired (TODO 2)
- API parameters
- Browser actions
What’s not supported
Section titled “What’s not supported”- v3/task async API — TODO 2. Real-time only today.