Integrations
Firecrawl

Firecrawl Integration

No sections found for this integration
The integration documentation may not have the expected structure

Overview

Firecrawl is a robust tool designed to transform websites into LLM-ready data by leveraging its Crawler and Scraper functionalities. Whether you need to map website structures or extract specific data, Firecrawl provides a seamless and customizable solution.

⚠️

Firecrawl nodes can now be used directly inside sync or async nodes. You no longer need to create a separate flow for crawling or scraping.

Features

✅ Key Functionalities

  • Web Crawling: Systematically browse and index websites, discovering and mapping their structure.
  • Web Scraping: Extract targeted content from specific web pages using customizable rules.
  • Integration with Webhooks: Receive real-time updates about crawling and scraping activities.
  • Dynamic Content Handling: Support for waiting on dynamic page loads and simulating mobile devices.

✅ Benefits

  • Generate structured data for language models.
  • Customize inclusion and exclusion of website sections.
  • Handle both static and dynamic web content.

Prerequisites

Before using Firecrawl, ensure the following:

  • A valid Firecrawl API Key (opens in a new tab).
  • Access to the Firecrawl service host URL.
  • Properly configured credentials for Firecrawl.
  • A webhook endpoint for receiving notifications (required for the crawler).
⚠️

For Self Hosting,If the connection fails, whitelist the following IPs: https://www.cloudflare.com/ips/ (opens in a new tab)

Setup

Step 1: Obtain API Credentials

  1. Register on Firecrawl (opens in a new tab).
  2. Generate an API key from your account dashboard.
  3. Note the Host URL and Webhook Endpoint.

Step 2: Configure Firecrawl Credentials

Use the following format to set up your credentials:

Key NameDescriptionExample Value
Credential NameName to identify this set of credentialsmy-firecrawl-creds
Firecrawl API KeyAuthentication key for accessing Firecrawl servicesfc_api_xxxxxxxxxxxxx
HostBase URL where Firecrawl service is hostedhttps://api.firecrawl.dev

Configuration Reference

Sync Mode Output Format

Batched Mode

{
  "success": true,
  "status": "completed",
  "completed": 48,
  "total": 50,
  "creditsUsed": 13,
  "expiresAt": "2025-08-01T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    },
    {
      "url": "https://example.com/page-2",
      "content": "Second page scraped content...",
      "metadata": {
        "title": "Page 2 Title",
        "description": "Another sample description.",
        "language": "en"
      }
    }
    // ... more pages
  ]
}

Single Mode

{
  "success": true,
  "status": "completed",
  "completed": 1,
  "total": 2,
  "creditsUsed": 1,
  "expiresAt": "2025-08-02T12:30:00.000Z",
  "data": [
    {
      "url": "https://example.com/page-1",
      "content": "Lorem ipsum dolor sit amet...",
      "metadata": {
        "title": "Page 1 Title",
        "description": "This is a sample description.",
        "language": "en"
      }
    }
  ]
}

Async Mode Output Format

{
  "success": true,
  "id": "8***************************7",
  "url": "https://api.firecrawl.dev/v1/crawl/8***************************7"
}

Crawler Configuration (Single)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLStarting point URL for the crawlerhttps://example.com
Exclude PathURL patterns to exclude from the crawl"admin/*", "private/*"
Include PathURL patterns to include in the crawl"blog/*", "products/*"
Crawl DepthMaximum depth to crawl relative to the entered URL3
Crawl LimitMaximum number of pages to crawl1000
Crawl Sub PagesToggle to enable or disable crawling sub pagestrue
Max Discovery DepthMax depth for discovering new URLs during the crawl5
Ignore SitemapIgnore the sitemap.xml file for crawlingfalse
Allow Backward LinksAllow crawling backward links (e.g., blog post → homepage)true
Allow External LinksAllow crawling external links (e.g., links to other domains)false
Ignore Query ParametersIgnore specific query parameters in URLsfalse
DelayDelay between requests to avoid overloading server (in seconds)2

Batch Crawler Configuration (Async / Sync)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URL ListList of starting URLs to crawl[ "https://x.com", "https://y.com" ]
Include PathPaths to include during crawl"blog/*"
Exclude PathPaths to exclude during crawl"admin/*"
Crawl DepthDepth to crawl for each URL3
Crawl LimitMax pages per domain500
Max Discovery DepthHow far discovered links can go4
Allow External LinksWhether to crawl external domainsfalse
Allow Backward LinksWhether to revisit previous pagestrue
Crawl Sub PagesEnable sub-page traversaltrue
Ignore SitemapSkip sitemap.xmlfalse
DelayThrottle request delay in seconds1
Callback WebhookURL to receive notifications about crawl completionhttps://example.com/webhook
Webhook HeadersHeaders to be sent to the webhook{'Content-Type:application/json'}
Webhook MetadataMetadata to be sent to the webhook{'status':'{{codeNode_540.status}}'}
Webhook EventsA multiselect list of events to be sent to the webhook["completed", "failed", "page", "started"]

Scraper Configuration (Single)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLTarget URL to scrapehttps://example.com/page
Main ContentExtract only main content (exclude header/footer/nav)true
Skip TLS VerificationBypass SSL certificate validationfalse
Include TagsHTML tags to include in extractionp, h1, h2, article
Exclude TagsHTML tags to exclude from extractionnav, footer, aside
Emulate Mobile DeviceSimulate mobile browser accesstrue
Wait for Page LoadTime to wait for dynamic content (in ms)123

Batch Scraper Configuration (Async)

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URL ListList of URLs to scrape in batch[ "https://a.com", "https://b.com" ]
Main ContentExtract only main content from each pagetrue
Skip TLS VerificationIgnore SSL certificate errorsfalse
Include TagsHTML tags to extractp, h1, h2
Exclude TagsHTML tags to exclude from extractionaside, footer
Emulate Mobile DeviceUse mobile browser viewporttrue
Wait for Page LoadDelay for dynamic content to appear (in ms)200
Callback WebhookURL to receive notifications about crawl completionhttps://example.com/webhook
Webhook HeadersHeaders to be sent to the webhook{'Content-Type:application/json'}
Webhook MetadataMetadata to be sent to the webhook{'status':'{{codeNode_540.status}}'}
Webhook EventsA multiselect list of events to be sent to the webhook["completed", "failed", "page", "started"]

Map URL Configuration

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLStarting URL to map the structurehttps://example.com
Main ContentExtract only main content from each pagetrue
Skip TLS VerificationIgnore SSL certificate errorsfalse
Include TagsHTML tags to extractp, h1, h2
Exclude TagsHTML tags to exclude from extractionaside, footer
Emulate Mobile DeviceUse mobile browser viewporttrue
Wait for Page LoadDelay for dynamic content to appear (in ms)200

Map URL Output Example

{
  "success": true,
  "links": [
    "https://lamatic.ai/docs",
    "https://lamatic.ai/docs/architecture",
    "https://lamatic.ai/docs/career",
    "https://lamatic.ai/docs/context",
    "https://lamatic.ai/docs/context/vectordb",
    "https://lamatic.ai/docs/context/vectordb/adding-data",
    "https://lamatic.ai/docs/contributing",
    "https://lamatic.ai/docs/demo",
    "https://lamatic.ai/docs/deployments",
    "https://lamatic.ai/docs/deployments/cache"
  ]
}

Troubleshooting

Common Issues

ProblemSolution
Invalid API KeyEnsure the API key is correct and has not expired.
Connection IssuesVerify that the host URL is correct and reachable.
Webhook ErrorsCheck if the webhook endpoint is active and correctly configured.
Crawling ErrorsReview the inclusion/exclusion paths for accuracy.
Dynamic Content Not LoadedIncrease the Wait for Page Load time in the configuration.

Debugging

Was this page useful?

Questions? We're here to help

Subscribe to updates