Firecrawl Integration

No sections found for this integration

The integration documentation may not have the expected structure

Overview

Firecrawl is a robust tool designed to transform websites into LLM-ready data by leveraging its Crawler and Scraper functionalities. Whether you need to map website structures or extract specific data, Firecrawl provides a seamless and customizable solution.

⚠️

Firecrawl nodes can now be used directly inside sync or async nodes. You no longer need to create a separate flow for crawling or scraping.

Features

✅ Key Functionalities

Web Crawling: Systematically browse and index websites, discovering and mapping their structure.
Web Scraping: Extract targeted content from specific web pages using customizable rules.
Integration with Webhooks: Receive real-time updates about crawling and scraping activities.
Dynamic Content Handling: Support for waiting on dynamic page loads and simulating mobile devices.

✅ Benefits

Generate structured data for language models.
Customize inclusion and exclusion of website sections.
Handle both static and dynamic web content.

Prerequisites

Before using Firecrawl, ensure the following:

A valid Firecrawl API Key (opens in a new tab).
Access to the Firecrawl service host URL.
Properly configured credentials for Firecrawl.
A webhook endpoint for receiving notifications (required for the crawler).

⚠️

For Self Hosting,If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)

Setup

Step 1: Obtain API Credentials

Register on Firecrawl (opens in a new tab).
Generate an API key from your account dashboard.
Note the Host URL and Webhook Endpoint.

Step 2: Configure Firecrawl Credentials

Use the following format to set up your credentials:

Key Name	Description	Example Value
Credential Name	Name to identify this set of credentials	`my-firecrawl-creds`
Firecrawl API Key	Authentication key for accessing Firecrawl services	`fc_api_xxxxxxxxxxxxx`
Host	Base URL where Firecrawl service is hosted	`https://api.firecrawl.dev`

Configuration Reference

Sync Mode Output Format

Batched Mode

{
  "success": true,
  "status": "completed",
  "completed": 48,
  "total": 50,
  "creditsUsed": 13,
  "expiresAt": "2025-08-01T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    },
    {
      "url": "https://example.com/page-2",
      "content": "Second page scraped content...",
      "metadata": {
        "title": "Page 2 Title",
        "description": "Another sample description.",
        "language": "en"
      }
    }
    // ... more pages
  ]
}

Single Mode

{
  "success": true,
  "status": "completed",
  "completed": 1,
  "total": 2,
  "creditsUsed": 1,
  "expiresAt": "2025-08-02T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    }
  ]
}

Async Mode Output Format

{
  "success": true,
  "id": "8***************************7",
  "url": "https://api.firecrawl.dev/v1/crawl/8***************************7"
}

Crawler Configuration (Single)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL	Starting point URL for the crawler	`https://example.com`
Exclude Path	URL patterns to exclude from the crawl	`"admin/", "private/"`
Include Path	URL patterns to include in the crawl	`"blog/", "products/"`
Crawl Depth	Maximum depth to crawl relative to the entered URL	`3`
Crawl Limit	Maximum number of pages to crawl	`1000`
Crawl Sub Pages	Toggle to enable or disable crawling sub pages	`true`
Max Discovery Depth	Max depth for discovering new URLs during the crawl	`5`
Ignore Sitemap	Ignore the sitemap.xml file for crawling	`false`
Allow Backward Links	Allow crawling backward links (e.g., blog post → homepage)	`true`
Allow External Links	Allow crawling external links (e.g., links to other domains)	`false`
Ignore Query Parameters	Ignore specific query parameters in URLs	`false`
Delay	Delay between requests to avoid overloading server (in seconds)	`2`

Batch Crawler Configuration (Async / Sync)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL List	List of starting URLs to crawl	`[ "https://x.com", "https://y.com" ]`
Include Path	Paths to include during crawl	`"blog/*"`
Exclude Path	Paths to exclude during crawl	`"admin/*"`
Crawl Depth	Depth to crawl for each URL	`3`
Crawl Limit	Max pages per domain	`500`
Max Discovery Depth	How far discovered links can go	`4`
Allow External Links	Whether to crawl external domains	`false`
Allow Backward Links	Whether to revisit previous pages	`true`
Crawl Sub Pages	Enable sub-page traversal	`true`
Ignore Sitemap	Skip sitemap.xml	`false`
Delay	Throttle request delay in seconds	`1`
Callback Webhook	URL to receive notifications about crawl completion	`https://example.com/webhook`
Webhook Headers	Headers to be sent to the webhook	`{'Content-Type:application/json'}`
Webhook Metadata	Metadata to be sent to the webhook	`{'status':'{{codeNode_540.status}}'}`
Webhook Events	A multiselect list of events to be sent to the webhook	`["completed", "failed", "page", "started"]`

Scraper Configuration (Single)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL	Target URL to scrape	`https://example.com/page`
Main Content	Extract only main content (exclude header/footer/nav)	`true`
Skip TLS Verification	Bypass SSL certificate validation	`false`
Include Tags	HTML tags to include in extraction	`p, h1, h2, article`
Exclude Tags	HTML tags to exclude from extraction	`nav, footer, aside`
Emulate Mobile Device	Simulate mobile browser access	`true`
Wait for Page Load	Time to wait for dynamic content (in ms)	`123`

Batch Scraper Configuration (Async)

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL List	List of URLs to scrape in batch	`[ "https://a.com", "https://b.com" ]`
Main Content	Extract only main content from each page	`true`
Skip TLS Verification	Ignore SSL certificate errors	`false`
Include Tags	HTML tags to extract	`p, h1, h2`
Exclude Tags	HTML tags to exclude from extraction	`aside, footer`
Emulate Mobile Device	Use mobile browser viewport	`true`
Wait for Page Load	Delay for dynamic content to appear (in ms)	`200`
Callback Webhook	URL to receive notifications about crawl completion	`https://example.com/webhook`
Webhook Headers	Headers to be sent to the webhook	`{'Content-Type:application/json'}`
Webhook Metadata	Metadata to be sent to the webhook	`{'status':'{{codeNode_540.status}}'}`
Webhook Events	A multiselect list of events to be sent to the webhook	`["completed", "failed", "page", "started"]`

Map URL Configuration

Parameter	Description	Example Value
Credential Name	Select previously saved credentials	`my-firecrawl-creds`
URL	Starting URL to map the structure	`https://example.com`
Main Content	Extract only main content from each page	`true`
Skip TLS Verification	Ignore SSL certificate errors	`false`
Include Tags	HTML tags to extract	`p, h1, h2`
Exclude Tags	HTML tags to exclude from extraction	`aside, footer`
Emulate Mobile Device	Use mobile browser viewport	`true`
Wait for Page Load	Delay for dynamic content to appear (in ms)	`200`

Map URL Output Example

{
  "success": true,
  "links": [
    "https://lamatic.ai/docs",
    "https://lamatic.ai/docs/architecture",
    "https://lamatic.ai/docs/career",
    "https://lamatic.ai/docs/context",
    "https://lamatic.ai/docs/context/vectordb",
    "https://lamatic.ai/docs/context/vectordb/adding-data",
    "https://lamatic.ai/docs/contributing",
    "https://lamatic.ai/docs/demo",
    "https://lamatic.ai/docs/deployments",
    "https://lamatic.ai/docs/deployments/cache"
  ]
}

Troubleshooting

Common Issues

Problem	Solution
Invalid API Key	Ensure the API key is correct and has not expired.
Connection Issues	Verify that the host URL is correct and reachable.
Webhook Errors	Check if the webhook endpoint is active and correctly configured.
Crawling Errors	Review the inclusion/exclusion paths for accuracy.
Dynamic Content Not Loaded	Increase the `Wait for Page Load` time in the configuration.

Debugging

Check Firecrawl logs for detailed error information.
Test the webhook endpoint to confirm it is receiving updates.
If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)