Skip to content

Integrations

Integrations connect justcrawl to your existing infrastructure for real-time URL ingestion and result delivery.

Push URLs to your own SQS queue. justcrawl polls it every few seconds and creates jobs automatically.

Message format:

{"url": "https://example.com/product/123"}

Optionally include a workflowId to override domain-based routing:

{"url": "https://example.com/product/123", "workflowId": "workflow-uuid"}

If no workflowId is provided, justcrawl resolves the workflow by domain (same as scheduled jobs).

Setup: Settings > Integrations > Add Input Source, or during onboarding. Provide your AWS Access Key ID, Secret Access Key, Region, and Queue URL.

justcrawl generates a unique webhook URL. POST an array of URLs to it:

Terminal window
curl -X POST https://dashboard.justcrawl.io/api/v1/webhooks/YOUR_TOKEN/ingest \
-H "Content-Type: application/json" \
-d '["https://example.com/a", "https://example.com/b"]'

Or with explicit workflow assignment:

[{"url": "https://example.com/a", "workflowId": "workflow-uuid"}]

Daily URL cap applies per input config (default: 10,000/day).

Scraping results are uploaded to your S3 bucket as HTML files:

s3://your-bucket/prefix/JOB_ID.html

Setup: Provide AWS credentials, bucket name, and optional prefix.

Results are POSTed to your endpoint as JSON:

{
"jobId": "job-uuid",
"url": "https://example.com",
"statusCode": 200,
"providerId": "brightdata",
"latencyMs": 1250,
"body": "<html>...</html>",
"deliveredAt": "2026-04-11T00:00:00Z"
}

HMAC signing: Every delivery includes an X-JustCrawl-Signature header with an HMAC-SHA256 signature using your secret. Verify this to authenticate the request.

Retry: Failed deliveries are retried up to 3 times with exponential backoff.

All integrations are managed in Settings > Integrations. Each config can be enabled/disabled independently. Status indicators show the last successful poll/delivery time and any errors.