Skip to content

Providers

justcrawl.io routes scraping requests across multiple providers. You bring your own API keys.

ProviderAuthAuto-route by URLAsync webhookVendor docs
Bright DataAPI Key + ZoneNo (single zone per account)Nodocs.brightdata.com
OxylabsUsername + PasswordYes (33 verified source templates)No (TODO 2)developers.oxylabs.io
Nimble WayBearer tokenNo (driver-based stealth)Nodocs.nimbleway.com
ZyteAPI KeyNo (auto-extract is TODO 3)Nodocs.zyte.com
DecodoAPI Key (formerly Smartproxy)Yes (17 verified target templates)No (TODO 2)help.decodo.com
  1. Go to Settings > Provider Accounts
  2. Click Add Provider
  3. Enter your credentials
  4. Click Test Connection to verify
  5. Save

Credentials are encrypted at rest with AES-256-GCM and never returned in API responses.

In Settings > Vendor Pricing, enter what you pay per 1,000 requests for each provider. This enables:

  • Cost strategy: ranks providers by cheapest-first
  • Cost estimates: shows expected cost per workflow on the dashboard
  • Premium domains: optionally set higher rates for specific domains (e.g., social media sites)

Smart optimization (benchmarking, strategy selection, auto-tuning) requires at least 2 connected providers. With a single provider, justcrawl acts as a simple proxy with scheduling and URL management.


Every service node in a workflow accepts a shared set of settings (Country, JS rendering, Wait selector, Parse, Screenshot, Markdown, Custom Headers, Custom Cookies, Sessions, Device, Advanced JSON) plus vendor-specific extras. See Service node settings for the full reference.

The capability matrix below shows which providers support which features. Cells marked ✓ are supported and exposed in the workflow editor.

FeatureBright DataOxylabsNimble WayZyteDecodo
JS rendering✓ (always)
Geo-targeting
Wait for selector
Parse (vendor-side structured)TODO 3
Screenshot (PNG)
Markdown output
Custom headers
Custom cookies
Sessions (sticky IP)
Device emulation
Auto-route by URL
Advanced (JSON) passthrough

Bright Data Web Unlocker is a synchronous proxy service that automatically renders JavaScript and handles anti-bot detection. The adapter targets the Web Unlocker product specifically. Async datasets (Web Scraper API) and SERP API are out of scope (TODO 1).

  • API Key — sent as Authorization: Bearer <key>.
  • Zone — the Web Unlocker zone name from your Bright Data dashboard (e.g., web_unlocker1). Must be Web Unlocker type, not Web Scraper, SERP, or Browser. Auto-detected from your account at connection time; pick from the dropdown.
FeatureNotes
JS renderingAlways on. Web Unlocker can’t be disabled.
Geo-targetingLowercase ISO-2 (us, gb, de). Passed as-is.
Wait for selectorEncoded in x-unblock-expect header. Requires “Manual expect elements” enabled on zone.
ScreenshotSet screenshot: truedata_format: "screenshot". Body is base64 PNG, contentType: image/png.
MarkdownSet markdown: truedata_format: "markdown". Body is markdown text. Screenshot wins if both set.
Custom headersMerged into request body headers field (preserves x-unblock-expect). Requires “Custom Headers & Cookies” enabled on zone.
Sessions / cookies / device emulationNot exposed by Web Unlocker.
  • format: "json" (default): {status_code, body, headers} envelope. body contains the rendered HTML or base64 PNG / markdown if a data_format is set.
  • format: "raw": response is the body directly, no envelope.

Zone-type mismatch. If your zone is not Web Unlocker, the response shape is different and body is missing. The adapter detects this and returns errorType: 'validation' with the message Bright Data response missing 'body' field — likely zone-type mismatch (zone must be Web Unlocker, not Web Scraper / SERP / Browser).

Premium domains. Walmart, Target, Instacart, and similar require a separate premium zone in your Bright Data account. Standard zones fail.

Feature gates. customHeaders and waitForSelector only work if the corresponding feature is enabled on the zone in the Bright Data UI. Without the feature, the field is silently ignored.

When you save Bright Data credentials, ScrapeRoute calls GET /zone/get_active_zones and stores the zones (with their type) in org_provider_resources. A periodic job (sync_provider_resources, every 15 min) keeps that table in sync with the dashboard. At scrape time, the resolver picks a zone in this order:

  1. Per-node zone override (workflow editor → service node config). Non-empty value wins over everything below.
  2. Host detector — known SERP hosts route through your SERP-typed zone: google.*, bing.com, duckduckgo.com, yandex.*, and yahoo.com/search. If your account doesn’t have a SERP zone, the resolver falls through.
  3. credentials.zone — your account’s default Web Unlocker zone.

The workflow editor’s zone dropdown includes a Use account default option (empty value) that lets a node opt out of an explicit pick and rely on the detector + credentials default.

  • Web Scraper API async (datasets/v3/trigger with snapshot polling) — TODO 1.

Oxylabs Realtime Web Scraper API. The adapter exposes 33 verified source templates (amazon_product, google_search, walmart, ebay, etsy, kroger, target, bestbuy, …) — every entry verified against developers.oxylabs.io — plus a universal fallback for any URL.

Leave the Source field empty and the adapter auto-detects the right Oxylabs source and parameter shape from your URL:

URLRoutes toWith params
amazon.com/dp/B08Y72CH1Famazon_product{query: "B08Y72CH1F", domain: "com"}
amazon.co.uk/dp/B0XXXXamazon_product{query: "B0XXXX", domain: "co.uk"}
amazon.com/s?k=laptopamazon_search{query: "laptop", domain: "com"}
walmart.com/ip/widget/123456walmart_product{product_id: "123456", domain: "com"}
walmart.com/search?q=tvwalmart_search{query: "tv"}
google.com/search?q=testgoogle_search{query: "test", domain: "com"}
ebay.com/itm/...ebay{url}
any other hostuniversal{url}

International TLDs are preserved (amazon.com.brdomain: "com.br"). Detector source: packages/providers/src/oxylabs-source-detector.ts.

To override, set Source explicitly in the node config — that wins over auto-detection. See the legacy-row guard below if you do.

  • Username + Password — HTTP Basic auth.
FeatureNotes
JS renderingrenderJs: truerender: "html".
Geo-targetingISO-2 lowercase converted to full country name (us"United States").
Wait for selectorbrowser_instructions[0] with CSS wait. Requires JS rendering. Default timeout 10s.
Parseparse: true → Oxylabs returns structured JSON, contentType: application/json.
Screenshotscreenshot: truerender: "png". Base64 PNG. Wins over renderJs.
Custom headersForwarded via context: [{key: "headers", value: {...}}].
Custom cookiesFlat Record<string,string>context: [{key: "cookies", value: {...}}].
SessionssessionIdcontext: [{key: "session_id", value: ...}].
Devicedesktop / mobile / tabletuser_agent_type (tablet maps to mobile).
HTTP POSThttpMethod: "POST" + httpBodycontext: [{key: "http_method", value: "post"}, {key: "content", value: "<base64>"}].

Oxylabs’s per-source identifier-field contract is heterogeneous, not uniform. Sending the wrong field name returns HTTP 400 with a vendor message like [product_id]: This field is missing. [query]: This field was not expected.

Identifier fieldSourcesParam shape
urluniversal, amazon, walmart, ebay, google, …{url}
query (+ optional domain)amazon_product, amazon_search, walmart_search, google_search{query, domain?}
product_id (+ optional domain)walmart_product{product_id, domain?}

If you set Source explicitly without supplying the matching field in sourceParams, the adapter returns:

errorType: 'validation'
error: "Oxylabs source 'walmart_product' requires 'product_id' in sourceParams. See https://developers.oxylabs.io/scraping-solutions/web-scraper-api/targets/walmart/product. Or clear the source field to use auto-detection from the URL."

Workaround: clear Source to use auto-detection, or populate sourceParams with the field that matches your chosen source ({query, domain} for search-style, {product_id, domain} for product-style) in Advanced (JSON).

Render timeout cap. The adapter’s maxTimeoutMs is 30s; Oxylabs’s upstream cap is 180s. Use Advanced (JSON) to extend (e.g., {"timeout": 120000}) up to the upstream cap.

Parsed = JSON, not HTML. When parse: true, the body is JSON.stringify-ed parsed data with contentType: application/json. Downstream HTML validators skip this content type automatically.

Oxylabs’s per-source contract is heterogeneous: URL-based sources take url, search-style sources take query (+ optional domain), product-style sources take product_id (+ optional domain). Sending the wrong field name fails with HTTP 400 and a vendor message like [product_id]: This field is missing. [query]: This field was not expected. The detector picks the right field for you when you leave Source on Auto-detect from URL; the table below is the contract for explicit-source workflows.

Identifier-field legend:

  • url — pass the full URL in sourceParams.url (or leave the URL field unchanged for auto).
  • query — pass a search term in sourceParams.query.
  • product_id — pass the vendor’s product id (e.g., Walmart item id, Amazon ASIN) in sourceParams.product_id.

Generated from SOURCE_REQUIRED_PARAM in packages/providers/src/oxylabs-source-detector.ts. Run pnpm exec tsx scripts/generate-vendor-doc-tables.ts to refresh.

SourceGroupIdentifier fieldVendor docs
universalGeneralurllink
amazonAmazonurllink
amazon_productAmazonquerylink
amazon_searchAmazonquerylink
amazon_pricingAmazonquerylink
amazon_bestsellersAmazonquerylink
amazon_sellersAmazonquerylink
googleGoogleurllink
google_searchGooglequerylink
google_adsGooglequerylink
google_travel_hotelsGooglequerylink
google_lensGooglequerylink
google_shopping_searchGooglequerylink
google_shopping_productGooglequerylink
google_trends_exploreGooglequerylink
bingBingurllink
bing_searchBingquerylink
walmartWalmarturllink
walmart_productWalmartproduct_idlink
walmart_searchWalmartquerylink
ebayeBayurllink
ebay_searcheBayquerylink
etsyE-commerceurllink
krogerE-commerceurllink
targetE-commerceurllink
bestbuyE-commerceurllink
lowesE-commerceurllink
costcoE-commerceurllink
aliexpressE-commerceurllink
alibabaE-commerceurllink
rakutenE-commerceurllink
flipkartE-commerceurllink
mercadolibreE-commerceurllink
  • Push-Pull async webhook — TODO 2. Realtime endpoint only today.
  • parsing_instructions UI — available via Advanced (JSON) only.

Nimble Way Web API. URL-uniform — there is no auto-routing by URL. Site-aware behavior comes from explicit driver selection (Nimble’s stealth profiles).

  • API Token — sent as Authorization: Bearer <token>.

Nimble doesn’t have site-specific templates. Instead, drivers tune scraping behavior:

DriverUse case
vx6Fast, no JavaScript — plain HTML only
vx8Headless + JS rendering (default when renderJs: true or waitForSelector set)
vx8-proEnhanced headless variant
vx10Full-page rendering variant
vx10-proStealth headful — auto-selected from device: 'mobile' or device: 'tablet'
Auto (default)Inferred per request: device hint → JS flag → vx6 fallback

When Driver is empty, the adapter infers:

  • device: 'mobile' | 'tablet'vx10-pro (with render enabled)
  • renderJs: true or waitForSelectorvx8
  • otherwise → backend default
FeatureNotes
JS renderingrender: true + driver. Auto-enabled with renderJs, waitForSelector, or device: 'mobile'/'tablet'.
Geo-targetingUppercase ISO-2 (US, GB, FR). Sub-country via Advanced (JSON).
Wait for selectorbrowser_actions array with timeout (default 10s). Forces vx8.
Screenshotformats: ['html', 'screenshot']. Base64 PNG, contentType: image/png.
Markdownformats: ['html', 'markdown']. contentType: text/markdown.
Custom headersPass-through object → body.headers.
Custom cookiesFlat map → array of {key, value, domain}. Each cookie’s domain defaults to the request URL hostname if not specified.
Devicedesktop / mobile / tablet → driver mapping above.

device: 'desktop' alone is a no-op. Selecting Desktop without also enabling renderJs or waitForSelector does not trigger rendering — Nimble requires render: true to be paired with a driver. Combine with Render JS or a wait selector. Fixed in v0.3.4.

Auto driver fix in v0.3.4. Empty driver string previously short-circuited inference. Now device: 'mobile' still resolves to vx10-pro and renderJs: true still falls back to vx8 when Driver = Auto.

Cookies must include domain. The adapter auto-fills it from the request URL hostname; only override via Advanced (JSON) if you need a different domain (e.g., .example.com).

  • Endpoint migration — adapter currently uses sdk.nimbleway.com/v1/extract. The canonical endpoint per docs is api.webit.live/api/v1/realtime/web. Migration is on the roadmap; both currently route the same.
  • First-class sessions — no session ID; identity is per-request.
  • parsing schema UI — available via Advanced (JSON) with a CSS-selector schema.

Zyte API at /v1/extract. Two execution modes — browser and HTTP — with a strict mutual-exclusion guard at the adapter boundary.

  • API Key — HTTP Basic with key as username, empty password.

Browser mode — full headless Chrome. Triggered when you set renderJs, screenshot, or waitForSelector.

  • Sends browserHtml: true to Zyte; supports actions array (waitForSelector with timeout).
  • For headers: only Referer is honored on browser mode (other custom headers are silently dropped server-side).

HTTP mode — pure fetch. Default when no browser flag is set.

  • Sends httpResponseBody: true to Zyte. Adapter base64-decodes the body to UTF-8.
  • Supports POST body and full custom-headers passthrough.
  • Required for httpMethod: "POST" and device emulation.

Mutual exclusion. Combining browser flags (renderJs, screenshot, waitForSelector) with HTTP-mode options (httpMethod: "POST" or httpBody) returns errorType: 'validation' before reaching Zyte’s API. Pick one mode.

FeatureNotes
JS renderingBrowser mode (browserHtml: true).
Geo-targetingUppercase ISO-2 → geolocation. 20 countries free; 50+ extended tier costs extra.
Wait for selectoractions[0] with timeout — adapter converts ms → seconds.
Screenshotscreenshot: true → base64 PNG. Adds extra cost. Browser mode only.
Custom headersHTTP mode: customHttpRequestHeaders (full passthrough). Browser mode: requestHeaders (only Referer honored).
Custom cookiesrequestCookies array. Each cookie’s domain defaults to the request URL hostname per eng-review A2.
Sessionssession.id (RFC 4122 v4 UUID). Free-form strings are deterministically hashed to a stable v4 UUID.
DeviceHTTP mode only. tablet maps to mobile. Silently omitted in browser mode.
HTTP POSTHTTP mode only. UTF-8 body via httpRequestText.
ipTypeVendor-specific node field: datacenter (default, free) or residential (KYC + per-request surcharge).

Selector timeout unit. justcrawl uses milliseconds; Zyte uses seconds. The adapter converts (Math.round(ms / 1000)). Default 10000 ms → timeout: 10 seconds.

Screenshot is independent of renderJs. Setting screenshot: true triggers browser mode on its own without a redundant browserHtml: true. Set both if you want HTML and a screenshot in the same request.

Residential IPs require KYC. Zyte requires identity verification before allowing residential routing. Per-request surcharge applies. Defaults to datacenter.

Sessions UUID coercion. Zyte requires session.id to be a v4 UUID. The adapter SHA-1-hashes any free-form string (e.g., "user-123-cart") to a deterministic v4 UUID — same input always produces the same Zyte session.

  • Auto-extraction routing (product: true, article: true, jobPosting: true) — TODO 3. The flags can be set via Advanced (JSON), but the response routing for structured JSON is not wired into the validation pipeline yet. Use only if you control the downstream consumer.

Decodo Web Scraping API real-time mode (formerly Smartproxy). 17 verified target templates (every entry checked against help.decodo.com’s llms.txt index) with auto-routing similar to Oxylabs. URLs that don’t match a verified target route through universal.

Leave Target empty for automatic routing. Same shape as Oxylabs:

URLRoutes toWith params
amazon.com/dp/B08Y72CH1Famazon_product{query: "B08Y72CH1F", domain: "com"}
amazon.com/s?k=laptopamazon_search{query: "laptop", domain: "com"}
amazon other pathsamazon{url}
google.com/search?q=testgoogle_search{query: "test", domain: "com"}
google other pathsuniversal{url}
walmart.com/ip/widget/12345walmart_product{product_id: "12345"}
walmart.com/search?q=tvwalmart_search{query: "tv"}
walmart otherwalmart{url}
ebay.*universal{url} (Decodo ships no per-target eBay doc page; URL passthrough)
any other hostuniversal{url}

Detector source: packages/providers/src/decodo-target-detector.ts. Explicit Target wins over auto-detection.

  • API Key (token) — sent as HTTP Basic username with empty password.
FeatureNotes
JS renderingrenderJs: trueheadless: "html".
Geo-targetingISO-2 → Title Case country name (e.g., us"United States").
Wait for selectorbrowser_actions[0] with default 10s timeout.
Parseparse: true → vendor-side parsed JSON (e.g., product details), contentType: application/json.
Screenshotscreenshot: trueheadless: "png". Base64 PNG. Wins over markdown / renderJs.
Markdownmarkdown: true → markdown text + auto-enables headless rendering.
Custom headersbody.headers object passthrough.
Custom cookiesbody.cookies object passthrough.
Sessionssession_id for sticky sessions.
Devicedevice_type: desktop / mobile / tablet.
HTTP POSThttp_method: "POST" + payload (base64).
FieldNotes
TargetExplicit override — 17 verified options grouped by site. Empty = auto-detect.
Proxy Poolstandard (default) or premium.
Page From / Page CountOnly meaningful for paginated targets like google_search, amazon_search.

Decodo mirrors Oxylabs’s heterogeneous identifier-field contract. Sending the wrong field name returns HTTP 400.

Identifier fieldTargetsParam shape
urluniversal, amazon, walmart, google{url}
query (+ optional domain)amazon_product, amazon_search, walmart_search, google_search{query, domain?}
product_idwalmart_product{product_id}

If you set Target explicitly without supplying the matching field in targetParams:

errorType: 'validation'
error: "Decodo target 'walmart_product' requires 'product_id' in targetParams. See https://help.decodo.com/docs/web-scraping-api-walmart-product. Or clear the target field to use auto-detection from the URL."

150s server timeout. Decodo’s real-time hard cap is 150s. The adapter uses maxTimeoutMs: 30000 as a conservative client guard.

Screenshot wins over markdown. If you set both, you get the PNG.

Markdown forces headless. The adapter auto-enables headless: "html" when markdown: true.

Decodo’s target contract mirrors Oxylabs’s: URL-based targets take url, search-style take query (+ optional domain), product-style take product_id (+ optional domain). The detector picks the right field for you when you leave Target on Auto-detect from URL; the table below is the contract for explicit-target workflows. Decodo’s documented surface is narrower than Oxylabs’s — only targets with a verified vendor doc page are listed; URLs that don’t match any of these go through universal.

Generated from TARGET_REQUIRED_PARAM in packages/providers/src/decodo-target-detector.ts. Run pnpm exec tsx scripts/generate-vendor-doc-tables.ts to refresh.

TargetGroupIdentifier fieldVendor docs
universalGeneralurllink
amazonAmazonurllink
amazon_productAmazonquerylink
amazon_searchAmazonquerylink
amazon_pricingAmazonquerylink
amazon_bestsellersAmazonquerylink
amazon_sellersAmazonquerylink
googleGoogleurllink
google_searchGooglequerylink
google_adsGooglequerylink
google_travel_hotelsGooglequerylink
google_shopping_searchGooglequerylink
google_shopping_productGooglequerylink
bing_searchBingquerylink
walmartWalmarturllink
walmart_productWalmartproduct_idlink
walmart_searchWalmartquerylink
  • v3/task async API — TODO 2. Real-time only today.