Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapellm.com/llms.txt

Use this file to discover all available pages before exploring further.

Async mode lets you submit a scrape and get a job_id back immediately — no open connection required. The scrape runs in the background, is automatically retried up to 3 times on failure, and credits are restored if every attempt fails. Use async when:
  • You’re processing 10+ prompts and don’t need instant results
  • Your prompts may take a long time and you don’t want to hold a connection open
  • You’re running a background pipeline or cron job
Use sync when:
  • You need the result immediately in the same request cycle
  • You’re processing a single prompt in response to a user action

How it works

POST /scrapers/{scraper}/jobs   →  { job_id, status: "pending" }   (HTTP 202)
GET  /jobs/{job_id}             →  { status: "pending" }            (poll)
GET  /jobs/{job_id}             →  { status: "done", result: {...} } (done)

Step 1 — Submit the job

POST https://api.scrapellm.com/scrapers/{scraper}/jobs Replace {scraper} with any of: chatgpt, perplexity, grok, copilot, gemini, google_ai_mode, amazon_rufus. Pass the same query parameters as the sync endpoint.
curl -X POST "https://api.scrapellm.com/scrapers/chatgpt/jobs" \
  -H "X-API-Key: YOUR_API_KEY" \
  -G \
  --data-urlencode "prompt=What brands do marketers recommend for email automation?" \
  --data-urlencode "country=US"

Step 2 — Poll for the result

GET https://api.scrapellm.com/jobs/{job_id}no authentication required. Poll every few seconds until status is "done" or "failed". Most scrapes complete in 5–30 seconds.
import time, requests

def wait_for_job(job_id, interval=3):
    while True:
        job = requests.get(
            f"https://api.scrapellm.com/jobs/{job_id}"
        ).json()

        if job["status"] == "done":
            return job["result"]
        if job["status"] == "failed":
            raise Exception(f"Job failed: {job.get('error')}")

        time.sleep(interval)

result = wait_for_job(job_id)
print(result["result"])        # plain text response
print(result["search_queries"]) # query fan-out (ChatGPT)

Job status response

FieldTypeDescription
job_idstringThe unique job UUID
statusstringpending · done · failed
resultobjectFull scrape response — present when status is "done"
errorstringError message — present when status is "failed"
created_atstringISO 8601 UTC timestamp when the job was submitted
completed_atstringISO 8601 UTC timestamp when the job finished. null while pending
Jobs are retained for 24 hours. After that, GET /jobs/{job_id} returns 404.

Job lifecycle

submit  →  pending  →  done
                    →  failed (after up to 3 automatic retries)
  • Credits deducted at submit time. If all retry attempts fail, credits are automatically restored.
  • Retries are automatic. Failed scrapes are retried up to 3 times before the job is marked failed.
  • No cancellation. Once submitted, a job runs to completion. Avoid submitting jobs you don’t intend to use.

Batch processing

Submit all jobs first, then poll — don’t submit-and-wait serially.
import time, requests

API_KEY = "YOUR_API_KEY"
SCRAPER = "chatgpt"

prompts = [
    "Best CRM for small business?",
    "Top email marketing platforms?",
    "Leading project management tools?",
    "Best accounting software for startups?",
    "Top HR software for SMBs?",
]

# 1. Submit all jobs
job_ids = []
for prompt in prompts:
    resp = requests.post(
        f"https://api.scrapellm.com/scrapers/{SCRAPER}/jobs",
        headers={"X-API-Key": API_KEY},
        params={"prompt": prompt, "country": "US"},
    )
    job_ids.append(resp.json()["job_id"])
    print(f"Submitted: {job_ids[-1]}")

# 2. Poll all until done
def poll(job_id):
    while True:
        job = requests.get(f"https://api.scrapellm.com/jobs/{job_id}").json()
        if job["status"] in ("done", "failed"):
            return job
        time.sleep(3)

results = [poll(jid) for jid in job_ids]

# 3. Process results
for prompt, result in zip(prompts, results):
    if result["status"] == "done":
        print(f"\n{prompt}")
        print(result["result"]["result"][:300])
    else:
        print(f"\n{prompt} → FAILED: {result.get('error')}")
Submit all jobs before polling any of them. This way all scrapes run in parallel on ScrapeLLM’s infrastructure rather than waiting for each one sequentially.

Cross-scraper batching

Submit the same prompt to multiple scrapers simultaneously to compare AI responses:
import time, requests

API_KEY = "YOUR_API_KEY"
PROMPT = "What CRM do sales teams recommend?"
SCRAPERS = ["chatgpt", "perplexity", "grok", "gemini"]

# Submit to all scrapers at once
job_ids = {
    scraper: requests.post(
        f"https://api.scrapellm.com/scrapers/{scraper}/jobs",
        headers={"X-API-Key": API_KEY},
        params={"prompt": PROMPT, "country": "US"},
    ).json()["job_id"]
    for scraper in SCRAPERS
}

# Poll all
def poll(job_id):
    while True:
        job = requests.get(f"https://api.scrapellm.com/jobs/{job_id}").json()
        if job["status"] in ("done", "failed"):
            return job
        time.sleep(3)

results = {scraper: poll(jid) for scraper, jid in job_ids.items()}

for scraper, result in results.items():
    if result["status"] == "done":
        print(f"\n── {scraper.upper()} ──")
        print(result["result"]["result"][:300])

Common questions

How long do jobs take?

Most complete in 5–30 seconds. Complex prompts with deep reasoning (e.g. Grok MODEL_MODE_EXPERT) may take up to 60 seconds. Set timeout up to 600 seconds on the job submit request if needed.

Do failed jobs consume credits?

No. Credits are deducted at submission but automatically restored if all 3 retry attempts fail.

Can I cancel a submitted job?

No. Once submitted, a job runs to completion. Only submit jobs you intend to use.

What polling interval should I use?

3 seconds is a reasonable default. Most scrapes complete in under 30 seconds, so you’d typically make 5–10 poll requests per job. Polling faster than 1 second provides no benefit.

How long are jobs retained?

24 hours from submission. After that, GET /jobs/{job_id} returns HTTP 404.

What if the job returns failed after retries?

The scrape encountered an unrecoverable error after 3 attempts. Credits are restored automatically. Resubmit with a more specific prompt, or try bypass_cache=true. For persistent failures on the same prompt, contact [email protected] with the job_id.