[Bug] Dead loop when scraping a forbidden webpage

# Describe the Bug
The worker keeps scraping web page **via PDF** without success.

# To Reproduce
POST to `/v2/scrape`
```json
{
  "url": "https://zhuanlan.zhihu.com/p/1904292801329488682"
}
```

# Expected Behavior
Succeed with content, or fail without content. No dead loop.

# Screenshots

<img width="977" height="330" alt="Image" src="https://github.com/user-attachments/assets/39b55f84-95fe-4046-9ad6-2ebd351d121b" />

# Environment:
- OS: Windows
- Deployment Type: Self-Hosted
- Firecrawl Version: main (d1418c8)
- Node.js Version: 22.18.0

# Logs
Please refer to the screen shot above.

# Additional Context
1. When receiving request, the enabled engines are `fetch`, `pdf`, and `docx`.
2. The first run of `buildFallbackList` only returns `fetch`, which is good.
3. The `fetch` scraper returns some content, with status code 403.
4. The scrape loop finds this "likely proxy error" and attempts to switch to stealth: https://github.com/firecrawl/firecrawl/blob/d1418c86e8c442b3008f5512a8e71771badda61f/apps/api/src/scraper/scrapeURL/index.ts#L285-L288
5. The outer loop adds `stealth` feature flag and re-call the scrape loop: https://github.com/firecrawl/firecrawl/blob/d1418c86e8c442b3008f5512a8e71771badda61f/apps/api/src/scraper/scrapeURL/index.ts#L641-L652
6. In this round, the `buildFallbackList` returns `pdf` and `docx`. I'm not quite sure why. I understand that `fetch` engine does not support stealth, but I don't know why `pdf` and `docx` come up here, while they are removed in the first run. (Why the are not removed this time)
7. Now, scrape with PDF, failed with AntiBotError
8. The outer loop catches the error, and remove the PDF feature flag: https://github.com/firecrawl/firecrawl/blob/d1418c86e8c442b3008f5512a8e71771badda61f/apps/api/src/scraper/scrapeURL/index.ts#L678-L683
9. Next round. This time, `buildFallbackList` still returns `pdf` and `docx`.
10. Dead loop.


	if (
	error instanceof AddFeatureError &&
	(meta.internalOptions.forceEngine === undefined \|\| Array.isArray(meta.internalOptions.forceEngine))
	) {
	meta.logger.debug(
	"More feature flags requested by scraper: adding " +
	error.featureFlags.join(", "),
	{ error, existingFlags: meta.featureFlags },
	);
	meta.featureFlags = new Set(
	[...meta.featureFlags].concat(error.featureFlags),
	);

	meta.logger.debug("PDF was blocked by anti-bot, prefetching with chrome-cdp");
	meta.featureFlags = new Set(
	[...meta.featureFlags].filter(
	(x) => x !== "pdf",
	),
	);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Dead loop when scraping a forbidden webpage #2056

Describe the Bug

To Reproduce

Expected Behavior

Screenshots

Environment:

Logs

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if (isLikelyProxyError && meta.options.proxy === "auto" && !meta.featureFlags.has("stealthProxy")) {
	meta.logger.info("Scrape via " + engine + " deemed unsuccessful due to proxy inadequacy. Adding stealthProxy flag.");
	throw new AddFeatureError(["stealthProxy"]);
	}

[Bug] Dead loop when scraping a forbidden webpage #2056

Description

Describe the Bug

To Reproduce

Expected Behavior

Screenshots

Environment:

Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions