Docs
Firecrawl

Firecrawl Documentation

Firecrawl is a robust tool designed to transform websites into LLM-ready data by leveraging its Crawler and Scraper functionalities. Whether you need to map website structures or extract specific data, Firecrawl provides a seamless and customizable solution.


Features

âś… Key Functionalities

  • Web Crawling: Systematically browse and index websites, discovering and mapping their structure.
  • Web Scraping: Extract targeted content from specific web pages using customizable rules.
  • Integration with Webhooks: Receive real-time updates about crawling and scraping activities.
  • Dynamic Content Handling: Support for waiting on dynamic page loads and simulating mobile devices.

âś… Benefits

  • Generate structured data for language models.
  • Customize inclusion and exclusion of website sections.
  • Handle both static and dynamic web content.

Prerequisites

Before using Firecrawl, ensure the following:

  • A valid Firecrawl API Key (opens in a new tab).
  • Access to the Firecrawl service host URL.
  • Properly configured credentials for Firecrawl.
  • A webhook endpoint for receiving notifications (required for the crawler).

Installation

Step 1: Obtain API Credentials

  1. Register on Firecrawl (opens in a new tab).
  2. Generate an API key from your account dashboard.
  3. Note the Host URL and Webhook Endpoint.

Step 2: Configure Firecrawl Credentials

Use the following format to set up your credentials:

Key NameDescriptionExample Value
Credential NameName to identify this set of credentialsmy-firecrawl-creds
Firecrawl API KeyAuthentication key for accessing Firecrawl servicesfc_api_xxxxxxxxxxxxx
HostBase URL where Firecrawl service is hostedhttps://api.firecrawl.dev

Configuration Reference

Crawler Configuration

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLStarting point URL for the crawlerhttps://example.com
Notification WebhookEndpoint to receive crawl updates and resultshttps://your-webhook.com/callback
Exclude PathURL patterns to exclude from the crawl"admin/*", "private/*"
Include PathURL patterns to include in the crawl"blog/*", "products/*"
Crawl DepthMaximum depth to crawl relative to the entered URL3
Crawl LimitMaximum number of pages to crawl1000
Crawl Sub PagesToggle to enable or disable crawling sub pagestrue

Scraper Configuration

ParameterDescriptionExample Value
Credential NameSelect previously saved credentialsmy-firecrawl-creds
URLTarget URL to scrapehttps://example.com/page
Main ContentExtract only the main content of the page, excluding headers, navs, footers, etc.true
Skip TLS VerificationBypass SSL certificate validationfalse
Include TagsHTML tags to include in extractionp, h1, h2, article
Exclude TagsHTML tags to exclude from extractionnav, footer, aside
Emulate Mobile DeviceSimulate mobile browser accesstrue
Wait for Page LoadTime to wait for dynamic content (ms)123

Troubleshooting

Common Issues

ProblemSolution
Invalid API KeyEnsure the API key is correct and has not expired.
Connection IssuesVerify that the host URL is correct and reachable.
Webhook ErrorsCheck if the webhook endpoint is active and correctly configured.
Crawling ErrorsReview the inclusion/exclusion paths for accuracy.
Dynamic Content Not LoadedIncrease the Wait for Page Load time in the configuration.

Debugging

  • Check Firecrawl logs for detailed error information.
  • Test the webhook endpoint to confirm it is receiving updates.

Was this page useful?

Questions? We're here to help

Subscribe to updates