Firecrawl Integration
No sections found for this integration
The integration documentation may not have the expected structure
Overview
Firecrawl is a robust tool designed to transform websites into LLM-ready data by leveraging its Crawler and Scraper functionalities. Whether you need to map website structures or extract specific data, Firecrawl provides a seamless and customizable solution.
⚠️
Firecrawl nodes can now be used directly inside sync or async nodes. You no longer need to create a separate flow for crawling or scraping.
Features
✅ Key Functionalities
- Web Crawling: Systematically browse and index websites, discovering and mapping their structure.
- Web Scraping: Extract targeted content from specific web pages using customizable rules.
- Integration with Webhooks: Receive real-time updates about crawling and scraping activities.
- Dynamic Content Handling: Support for waiting on dynamic page loads and simulating mobile devices.
✅ Benefits
- Generate structured data for language models.
- Customize inclusion and exclusion of website sections.
- Handle both static and dynamic web content.
Prerequisites
Before using Firecrawl, ensure the following:
- A valid Firecrawl API Key (opens in a new tab).
- Access to the Firecrawl service host URL.
- Properly configured credentials for Firecrawl.
- A webhook endpoint for receiving notifications (required for the crawler).
⚠️
For Self Hosting,If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)
Setup
Step 1: Obtain API Credentials
- Register on Firecrawl (opens in a new tab).
- Generate an API key from your account dashboard.
- Note the Host URL and Webhook Endpoint.
Step 2: Configure Firecrawl Credentials
Use the following format to set up your credentials:
Key Name | Description | Example Value |
---|---|---|
Credential Name | Name to identify this set of credentials | my-firecrawl-creds |
Firecrawl API Key | Authentication key for accessing Firecrawl services | fc_api_xxxxxxxxxxxxx |
Host | Base URL where Firecrawl service is hosted | https://api.firecrawl.dev |
Configuration Reference
Sync Mode Output Format
Batched Mode
{
"success": true,
"status": "completed",
"completed": 48,
"total": 50,
"creditsUsed": 13,
"expiresAt": "2025-08-01T12:30:00.000Z",
"data": [
{
"url": "https://example.com/page-1",
"content": "Lorem ipsum dolor sit amet...",
"metadata": {
"title": "Page 1 Title",
"description": "This is a sample description.",
"language": "en"
}
},
{
"url": "https://example.com/page-2",
"content": "Second page scraped content...",
"metadata": {
"title": "Page 2 Title",
"description": "Another sample description.",
"language": "en"
}
}
// ... more pages
]
}
Single Mode
{
"success": true,
"status": "completed",
"completed": 1,
"total": 2,
"creditsUsed": 1,
"expiresAt": "2025-08-02T12:30:00.000Z",
"data": [
{
"url": "https://example.com/page-1",
"content": "Lorem ipsum dolor sit amet...",
"metadata": {
"title": "Page 1 Title",
"description": "This is a sample description.",
"language": "en"
}
}
]
}
Async Mode Output Format
{
"success": true,
"id": "8***************************7",
"url": "https://api.firecrawl.dev/v1/crawl/8***************************7"
}
Crawler Configuration (Single)
Parameter | Description | Example Value |
---|---|---|
Credential Name | Select previously saved credentials | my-firecrawl-creds |
URL | Starting point URL for the crawler | https://example.com |
Exclude Path | URL patterns to exclude from the crawl | "admin/*", "private/*" |
Include Path | URL patterns to include in the crawl | "blog/*", "products/*" |
Crawl Depth | Maximum depth to crawl relative to the entered URL | 3 |
Crawl Limit | Maximum number of pages to crawl | 1000 |
Crawl Sub Pages | Toggle to enable or disable crawling sub pages | true |
Max Discovery Depth | Max depth for discovering new URLs during the crawl | 5 |
Ignore Sitemap | Ignore the sitemap.xml file for crawling | false |
Allow Backward Links | Allow crawling backward links (e.g., blog post → homepage) | true |
Allow External Links | Allow crawling external links (e.g., links to other domains) | false |
Ignore Query Parameters | Ignore specific query parameters in URLs | false |
Delay | Delay between requests to avoid overloading server (in seconds) | 2 |
Batch Crawler Configuration (Async / Sync)
Parameter | Description | Example Value |
---|---|---|
Credential Name | Select previously saved credentials | my-firecrawl-creds |
URL List | List of starting URLs to crawl | [ "https://x.com", "https://y.com" ] |
Include Path | Paths to include during crawl | "blog/*" |
Exclude Path | Paths to exclude during crawl | "admin/*" |
Crawl Depth | Depth to crawl for each URL | 3 |
Crawl Limit | Max pages per domain | 500 |
Max Discovery Depth | How far discovered links can go | 4 |
Allow External Links | Whether to crawl external domains | false |
Allow Backward Links | Whether to revisit previous pages | true |
Crawl Sub Pages | Enable sub-page traversal | true |
Ignore Sitemap | Skip sitemap.xml | false |
Delay | Throttle request delay in seconds | 1 |
Callback Webhook | URL to receive notifications about crawl completion | https://example.com/webhook |
Webhook Headers | Headers to be sent to the webhook | {'Content-Type:application/json'} |
Webhook Metadata | Metadata to be sent to the webhook | {'status':'{{codeNode_540.status}}'} |
Webhook Events | A multiselect list of events to be sent to the webhook | ["completed", "failed", "page", "started"] |
Scraper Configuration (Single)
Parameter | Description | Example Value |
---|---|---|
Credential Name | Select previously saved credentials | my-firecrawl-creds |
URL | Target URL to scrape | https://example.com/page |
Main Content | Extract only main content (exclude header/footer/nav) | true |
Skip TLS Verification | Bypass SSL certificate validation | false |
Include Tags | HTML tags to include in extraction | p, h1, h2, article |
Exclude Tags | HTML tags to exclude from extraction | nav, footer, aside |
Emulate Mobile Device | Simulate mobile browser access | true |
Wait for Page Load | Time to wait for dynamic content (in ms) | 123 |
Batch Scraper Configuration (Async)
Parameter | Description | Example Value |
---|---|---|
Credential Name | Select previously saved credentials | my-firecrawl-creds |
URL List | List of URLs to scrape in batch | [ "https://a.com", "https://b.com" ] |
Main Content | Extract only main content from each page | true |
Skip TLS Verification | Ignore SSL certificate errors | false |
Include Tags | HTML tags to extract | p, h1, h2 |
Exclude Tags | HTML tags to exclude from extraction | aside, footer |
Emulate Mobile Device | Use mobile browser viewport | true |
Wait for Page Load | Delay for dynamic content to appear (in ms) | 200 |
Callback Webhook | URL to receive notifications about crawl completion | https://example.com/webhook |
Webhook Headers | Headers to be sent to the webhook | {'Content-Type:application/json'} |
Webhook Metadata | Metadata to be sent to the webhook | {'status':'{{codeNode_540.status}}'} |
Webhook Events | A multiselect list of events to be sent to the webhook | ["completed", "failed", "page", "started"] |
Map URL Configuration
Parameter | Description | Example Value |
---|---|---|
Credential Name | Select previously saved credentials | my-firecrawl-creds |
URL | Starting URL to map the structure | https://example.com |
Main Content | Extract only main content from each page | true |
Skip TLS Verification | Ignore SSL certificate errors | false |
Include Tags | HTML tags to extract | p, h1, h2 |
Exclude Tags | HTML tags to exclude from extraction | aside, footer |
Emulate Mobile Device | Use mobile browser viewport | true |
Wait for Page Load | Delay for dynamic content to appear (in ms) | 200 |
Map URL Output Example
{
"success": true,
"links": [
"https://lamatic.ai/docs",
"https://lamatic.ai/docs/architecture",
"https://lamatic.ai/docs/career",
"https://lamatic.ai/docs/context",
"https://lamatic.ai/docs/context/vectordb",
"https://lamatic.ai/docs/context/vectordb/adding-data",
"https://lamatic.ai/docs/contributing",
"https://lamatic.ai/docs/demo",
"https://lamatic.ai/docs/deployments",
"https://lamatic.ai/docs/deployments/cache"
]
}
Troubleshooting
Common Issues
Problem | Solution |
---|---|
Invalid API Key | Ensure the API key is correct and has not expired. |
Connection Issues | Verify that the host URL is correct and reachable. |
Webhook Errors | Check if the webhook endpoint is active and correctly configured. |
Crawling Errors | Review the inclusion/exclusion paths for accuracy. |
Dynamic Content Not Loaded | Increase the Wait for Page Load time in the configuration. |
Debugging
- Check Firecrawl logs for detailed error information.
- Test the webhook endpoint to confirm it is receiving updates.
- If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)