Scraper Node
Overview
Scraper Node is a web scraping tool designed to transform entire websites into structured, LLM-compatible data using Firecrawl. It Extract targeted content from specific web pages using customizable rules.

Node Type Information
Node Type Information
| Type | Description | Status |
|---|---|---|
| Batch Trigger | Starts the flow on a schedule or batch event. Ideal for periodic data processing. | ❌ False |
| Event Trigger | Starts the flow based on external events (e.g., webhook, user interaction). | ❌ False |
| Action | Executes a task or logic as part of the flow (e.g., API call, transformation). | ✅ True |
This node is an Action node that scrapes web content from specified URLs using Firecrawl, extracting structured data for further processing.
Features
Key Functionalities
-
Customizable URL Scraping: Allows users to specify the exact URL to scrape, ensuring targeted data extraction.
-
Selective Content Scraping: Features options to scrape only the main content or include/exclude specific tags, providing control over the scraped data.
-
Agent Workflows: Submit and monitor agent tasks with three modes - Async Agent, Sync Agent, and Check Agent for intelligent web data extraction.
-
TLS Verification: Includes an option to skip TLS verification, enabling compatibility with a wider range of websites.
-
Device Emulation: Offers the ability to emulate mobile devices, ensuring accurate scraping of mobile-optimized pages.
-
Adjustable Load Timing: Allows users to set a custom wait time (in milliseconds) for pages to load, accommodating different website performance speeds.
Benefits
-
Precision: Enables targeted scraping by including or excluding specific tags and controlling the scraping scope.
-
Flexibility: Supports mobile device emulation and TLS verification bypass for greater adaptability across websites.
-
Efficiency: The adjustable wait time ensures optimal scraping speed without missing dynamic content.
-
Ease of Use: A user-friendly interface with clear options makes it accessible for both technical and non-technical users.
Prerequisites
Before using Crawler Node, ensure the following:
- A valid Firecrawl API Key (opens in a new tab).
- Access to the Firecrawl service host URL.
- Properly configured credentials for Firecrawl.
- A webhook endpoint for receiving notifications (required for the crawler).
Installation
Step 1: Obtain API Credentials
- Register on Firecrawl (opens in a new tab).
- Generate an API key from your account dashboard.
- Note the Host URL and Webhook Endpoint.
Step 2: Configure Firecrawl Credentials
Use the following format to set up your credentials:
| Key Name | Description | Example Value |
|---|---|---|
| Credential Name | Name to identify this set of credentials | my-firecrawl-creds |
| Firecrawl API Key | Authentication key for accessing Firecrawl services | fc_api_xxxxxxxxxxxxx |
| Host | Base URL where Firecrawl service is hosted | https://api.firecrawl.dev |
Configuration Reference
Scraper Configuration
| Parameter | Description | Example Value |
|---|---|---|
| Credential Name | Select previously saved credentials | my-firecrawl-creds |
| URL | Target URL to scrape | https://example.com/page |
| Main Content | Extract only the main content of the page, excluding headers, navs, footers, etc. | true |
| Skip TLS Verification | Bypass SSL certificate validation | false |
| Include Tags | HTML tags to include in extraction | p, h1, h2, article |
| Exclude Tags | HTML tags to exclude from extraction | nav, footer, aside |
| Emulate Mobile Device | Simulate mobile browser access | true |
| Wait for Page Load | Time to wait for dynamic content (ms) | 123 |
Agent Workflow Configuration
The Scraper Node now supports agent workflows with three execution modes for intelligent web data extraction and task monitoring.
Agent Modes
- Async Agent: Submit agent tasks asynchronously and receive a task ID for later status checking.
- Sync Agent: Execute agent tasks synchronously and receive results immediately upon completion.
- Check Agent: Check the status of a previously submitted async agent task.
Agent Configuration (Async / Sync)
| Parameter | Description | Example Value |
|---|---|---|
| Credential Name | Select previously saved credentials | my-firecrawl-creds |
| Prompt | The prompt or instruction for the agent to execute | "Extract all product prices from the page" |
| URLs | List of URLs for the agent to process | [ "https://example.com/products" ] |
| Schema | JSON schema defining the expected output structure | {"price": "string", "name": "string"} |
| Mx Credit | Maximum credits to use for the agent task | 100 |
| Strict URL Constraints | Enforce strict URL matching rules | true |
Check Agent Configuration
| Parameter | Description | Example Value |
|---|---|---|
| Credential Name | Select previously saved credentials | my-firecrawl-creds |
| Agent Task ID | The task ID returned from an Async Agent submission | "8***************************7" |
Low-Code Example
nodes:
- nodeId: scraperNode_680
nodeType: scraperNode
nodeName: Scraper
values:
credentials: ''
url: https://lamatic.ai/docs
onlyMainContent: false
skipTLsVerification: false
mobile: false
waitFor: 123
includeTags: []
excludeTags: []
needs:
- triggerNode_1Output
Scraper Output
markdown: A string containing the scraped content formatted as Markdown, if available.language: A string indicating the detected language of the scraped content.referrer: A string representing the referrer URL or source that led to the scraped page, if applicable.title: A string containing the title of the scraped page or resource.scrapeId: A string serving as a unique identifier for the scrape operation.sourceURL: A string specifying the original URL from which the content was scraped.url: A string indicating the URL of the scraped resource, potentially processed or resolved.statusCode: An integer reflecting the HTTP status code returned during the scrape operation.
Example Scraper Output
{
"markdown": "",
"language": "",
"referrer": "",
"title": "",
"scrapeId": "",
"sourceURL": "",
"url": "",
"statusCode":
}Troubleshooting
Common Issues
| Problem | Solution |
|---|---|
| Invalid API Key | Ensure the API key is correct and has not expired. |
| Connection Issues | Verify that the host URL is correct and reachable. |
| Webhook Errors | Check if the webhook endpoint is active and correctly configured. |
| Crawling Errors | Review the inclusion/exclusion paths for accuracy. |
| Dynamic Content Not Loaded | Increase the Wait for Page Load time in the configuration. |
Debugging
- Check Firecrawl logs for detailed error information.
- Test the webhook endpoint to confirm it is receiving updates.