-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the Bug
The worker keeps scraping web page via PDF without success.
To Reproduce
POST to /v2/scrape
{
"url": "https://zhuanlan.zhihu.com/p/1904292801329488682"
}
Expected Behavior
Succeed with content, or fail without content. No dead loop.
Screenshots

Environment:
- OS: Windows
- Deployment Type: Self-Hosted
- Firecrawl Version: main (d1418c8)
- Node.js Version: 22.18.0
Logs
Please refer to the screen shot above.
Additional Context
- When receiving request, the enabled engines are
fetch
,pdf
, anddocx
. - The first run of
buildFallbackList
only returnsfetch
, which is good. - The
fetch
scraper returns some content, with status code 403. - The scrape loop finds this "likely proxy error" and attempts to switch to stealth:
firecrawl/apps/api/src/scraper/scrapeURL/index.ts
Lines 285 to 288 in d1418c8
if (isLikelyProxyError && meta.options.proxy === "auto" && !meta.featureFlags.has("stealthProxy")) { meta.logger.info("Scrape via " + engine + " deemed unsuccessful due to proxy inadequacy. Adding stealthProxy flag."); throw new AddFeatureError(["stealthProxy"]); } - The outer loop adds
stealth
feature flag and re-call the scrape loop:firecrawl/apps/api/src/scraper/scrapeURL/index.ts
Lines 641 to 652 in d1418c8
if ( error instanceof AddFeatureError && (meta.internalOptions.forceEngine === undefined || Array.isArray(meta.internalOptions.forceEngine)) ) { meta.logger.debug( "More feature flags requested by scraper: adding " + error.featureFlags.join(", "), { error, existingFlags: meta.featureFlags }, ); meta.featureFlags = new Set( [...meta.featureFlags].concat(error.featureFlags), ); - In this round, the
buildFallbackList
returnspdf
anddocx
. I'm not quite sure why. I understand thatfetch
engine does not support stealth, but I don't know whypdf
anddocx
come up here, while they are removed in the first run. (Why the are not removed this time) - Now, scrape with PDF, failed with AntiBotError
- The outer loop catches the error, and remove the PDF feature flag:
firecrawl/apps/api/src/scraper/scrapeURL/index.ts
Lines 678 to 683 in d1418c8
meta.logger.debug("PDF was blocked by anti-bot, prefetching with chrome-cdp"); meta.featureFlags = new Set( [...meta.featureFlags].filter( (x) => x !== "pdf", ), ); - Next round. This time,
buildFallbackList
still returnspdf
anddocx
. - Dead loop.
zliebersbach, lalit-swan and michalmau
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working