URLs
URLs are the targets you want to scrape. justcrawl manages them with tagging, domain auto-detection, and batch operations.
Adding URLs
Section titled “Adding URLs”Single URL: Dashboard > URLs > Add URL, or via API:
curl -X POST https://dashboard.justcrawl.io/api/v1/urls \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/product/123"}'Batch: JSON array or CSV upload (up to 10,000 per batch).
curl -X POST https://dashboard.justcrawl.io/api/v1/urls/batch \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"urls": [{"url": "https://example.com/a"}, {"url": "https://example.com/b"}]}'CSV upload: Upload a CSV file with a url column. Optional columns: priority, tags.
Tags organize URLs for filtering. Two types:
- User tags: You assign these (e.g.,
electronics,high-priority) - Domain tags: Auto-generated from the URL domain (e.g.,
domain:amazon.com)
Domain tags are used for workflow routing. When you add a URL for amazon.com, it automatically gets the tag domain:amazon.com, which matches the workflow route domain:amazon.com.
Priority
Section titled “Priority”URLs have a priority (0-100, default 0). Higher priority URLs are processed first within a schedule run.
Deduplication
Section titled “Deduplication”URLs are unique per organization. Adding a duplicate URL is silently ignored (no error, no duplicate created).
Schedules and URLs
Section titled “Schedules and URLs”Schedules filter URLs by tags. A schedule with tagFilters: ["electronics"] only processes URLs tagged electronics. A schedule with no tag filters processes all URLs.