Firecrawl 文档

Crawl multiple URLs based on options

curl --request POST \
  --url https://api.firecrawl.dev/v1/crawl \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "excludePaths": [
    "<string>"
  ],
  "includePaths": [
    "<string>"
  ],
  "maxDepth": 2,
  "ignoreSitemap": true,
  "limit": 10,
  "allowBackwardLinks": false,
  "allowExternalLinks": false,
  "webhook": "<string>",
  "scrapeOptions": {
    "formats": [
      "markdown"
    ],
    "headers": {},
    "includeTags": [
      "<string>"
    ],
    "excludeTags": [
      "<string>"
    ],
    "onlyMainContent": true,
    "removeBase64Images": true,
    "mobile": false,
    "waitFor": 123
  }
}
'

{
  "success": true,
  "id": "<string>",
  "url": "<string>"
}

POST

crawl

Crawl multiple URLs based on options

curl --request POST \
  --url https://api.firecrawl.dev/v1/crawl \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "excludePaths": [
    "<string>"
  ],
  "includePaths": [
    "<string>"
  ],
  "maxDepth": 2,
  "ignoreSitemap": true,
  "limit": 10,
  "allowBackwardLinks": false,
  "allowExternalLinks": false,
  "webhook": "<string>",
  "scrapeOptions": {
    "formats": [
      "markdown"
    ],
    "headers": {},
    "includeTags": [
      "<string>"
    ],
    "excludeTags": [
      "<string>"
    ],
    "onlyMainContent": true,
    "removeBase64Images": true,
    "mobile": false,
    "waitFor": 123
  }
}
'

{
  "success": true,
  "id": "<string>",
  "url": "<string>"
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

url

string<uri>

required

The base URL to start crawling from

excludePaths

string[]

Specifies URL patterns to exclude from the crawl by comparing website paths against the provided regex patterns. For example, if you set "excludePaths": ["blog/*"] for the base URL firecrawl.dev, any results matching that pattern will be excluded, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.

includePaths

string[]

Specifies URL patterns to include in the crawl by comparing website paths against the provided regex patterns. Only the paths that match the specified patterns will be included in the response. For example, if you set "includePaths": ["blog/*"] for the base URL firecrawl.dev, only results matching that pattern will be included, such as https://www.firecrawl.dev/blog/firecrawl-launch-week-1-recap.

maxDepth

integer

default:2

Maximum depth to crawl relative to the entered URL.

ignoreSitemap

boolean

default:true

Ignore the website sitemap when crawling

limit

integer

default:10

Maximum number of pages to crawl. Default limit is 10000.

allowBackwardLinks

boolean

default:false

Enables the crawler to navigate from a specific URL to previously linked pages.

allowExternalLinks

boolean

default:false

Allows the crawler to follow links to external websites.

webhook

The URL to send the webhook to. This will trigger for crawl started (crawl.started) ,every page crawled (crawl.page) and when the crawl is completed (crawl.completed or crawl.failed). The response will be the same as the /scrape endpoint.

scrapeOptions

object

Show child attributes

Response

Successful response

success

boolean

string

url

string<uri>

抓取获取爬取状态

⌘I

使用 API

端点

爬取

Authorizations

Body

Response