Skip to main content

安装

要安装Firecrawl Rust SDK,请在您的Cargo.toml中添加以下内容:
Rust
# 将此添加到你的Cargo.toml
[dependencies]
firecrawl = "^1.0"
tokio = { version = "^1", features = ["full"] }

使用方法

首先,您需要从firecrawl.dev获取API密钥。然后,您需要初始化FirecrawlApp。之后,您可以访问像FirecrawlApp::scrape_url这样的函数,让您使用我们的API。 以下是如何在Rust中使用SDK的示例:
Rust
use firecrawl::{crawl::{CrawlOptions, CrawlScrapeOptions, CrawlScrapeFormats}, FirecrawlApp, scrape::{ScrapeOptions, ScrapeFormats}};

#[tokio::main]
async fn main() {
    // 使用API密钥初始化FirecrawlApp
    let app = FirecrawlApp::new("fc-YOUR_API_KEY").expect("Failed to initialize FirecrawlApp");

    // 抓取一个URL
    let options = ScrapeOptions {
        formats vec! [ ScrapeFormats::Markdown, ScrapeFormats::HTML ].into(),
        ..Default::default()
    };

    let scrape_result = app.scrape_url("https://firecrawl.dev", options).await;

    match scrape_result {
        Ok(data) => println!("Scrape Result:\n{}", data.markdown.unwrap()),
        Err(e) => eprintln!("Map failed: {}", e),
    }

    // 爬取一个网站
    let crawl_options = CrawlOptions {
        scrape_options: CrawlScrapeOptions {
            formats: vec![ CrawlScrapeFormats::Markdown, CrawlScrapeFormats::HTML ].into(),
            ..Default::default()
        }.into(),
        limit: 100.into(),
        ..Default::default()
    };

    let crawl_result = app
        .crawl_url("https://mendable.ai", crawl_options)
        .await;

    match crawl_result {
        Ok(data) => println!("Crawl Result (used {} credits):\n{:#?}", data.credits_used, data.data),
        Err(e) => eprintln!("Crawl failed: {}", e),
    }
}

抓取单个URL

要抓取单个URL,请使用scrape_url方法。它接受URL作为参数,并以Document形式返回抓取的数据。
Rust
let options = ScrapeOptions {
    formats vec! [ ScrapeFormats::Markdown, ScrapeFormats::HTML ].into(),
    ..Default::default()
};

let scrape_result = app.scrape_url("https://firecrawl.dev", options).await;

match scrape_result {
    Ok(data) => println!("Scrape Result:\n{}", data.markdown.unwrap()),
    Err(e) => eprintln!("Map failed: {}", e),
}

使用Extract抓取

使用Extract,您可以轻松地从任何URL提取结构化数据。您需要使用serde_json::json!宏以JSON Schema格式指定您的模式。
Rust
let json_schema = json!({
    "type": "object",
    "properties": {
        "top": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "points": {"type": "number"},
                    "by": {"type": "string"},
                    "commentsURL": {"type": "string"}
                },
                "required": ["title", "points", "by", "commentsURL"]
            },
            "minItems": 5,
            "maxItems": 5,
            "description": "Hacker News上的前5个故事"
        }
    },
    "required": ["top"]
});

let llm_extraction_options = ScrapeOptions {
    formats: vec![ ScrapeFormats::Extract ].into(),
    extract: ExtractOptions {
        schema: json_schema.into(),
        ..Default::default()
    }.into(),
    ..Default::default()
};

let llm_extraction_result = app
    .scrape_url("https://news.ycombinator.com", llm_extraction_options)
    .await;

match llm_extraction_result {
    Ok(data) => println!("LLM提取结果:\n{:#?}", data.extract.unwrap()),
    Err(e) => eprintln!("LLM提取失败: {}", e),
}

爬取网站

要爬取网站,请使用crawl_url方法。这将等待爬取完成,根据您的起始URL和选项,可能需要很长时间。
Rust
// 配置爬取选项
let crawl_options = CrawlOptions {
    scrape_options: CrawlScrapeOptions {
        formats: vec![ CrawlScrapeFormats::Markdown, CrawlScrapeFormats::HTML ].into(),
        ..Default::default()
    }.into(),
    limit: 100.into(),
    ..Default::default()
};

// 爬取一个网站
let crawl_result = app
    .crawl_url("https://mendable.ai", crawl_options)
    .await;

match crawl_result {
    Ok(data) => println!("Crawl Result (used {} credits):\n{:#?}", data.credits_used, data.data),
    Err(e) => eprintln!("Crawl failed: {}", e),
}

异步爬取

要在不等待结果的情况下爬取,请使用crawl_url_async方法。它接受相同的参数,但返回一个包含爬取ID的CrawlAsyncRespone结构体。您可以随时使用该ID和check_crawl_status方法检查状态。请注意,已完成的爬取会在24小时后删除。
Rust
let crawl_id = app.crawl_url_async("https://mendable.ai", None).await?.id;

// ... 稍后 ...

let status = app.check_crawl_status(crawl_id).await?;

if status.status == CrawlStatusTypes::Completed {
    println!("Crawl is done: {:#?}", status.data);
} else {
    // ... 继续等待 ...
}

映射URL(Alpha版)

从起始URL映射所有相关链接。
Rust
let map_result = app.map_url("https://firecrawl.dev", None).await;

match map_result {
    Ok(data) => println!("Mapped URLs: {:#?}", data),
    Err(e) => eprintln!("Map failed: {}", e),
}

错误处理

SDK处理Firecrawl API和我们的依赖项返回的错误,并将它们组合到实现了ErrorDebugDisplayFirecrawlError枚举中。我们所有的方法都返回Result<T, FirecrawlError>