Skip to content

Bug: extract method with schema does not include schema properties in the request #1044

@ViktorTrojan

Description

@ViktorTrojan

Before submitting an issue, please:

Environment Information

Please provide the following information to help us reproduce and resolve your issue:

Stagehand:

  • Language/SDK: TypeScript
  • Stagehand version: latest

AI Provider:

  • Provider: Custom OpenAI (via OpenAI-compatible endpoint)
  • Model: custom_openai_model

Issue Description

When using the extract method and specifying a schema using Zod, the outgoing request to the LLM contains a response_format object with a json_schema, but the schema property within it is nearly empty. It does not include the properties defined in the Zod schema that was passed to the method. It only contains {"$schema": "http://json-schema.org/draft-07/schema#"}.

Steps to Reproduce

  1. Configure Stagehand with a custom OpenAI client (like the one in the reproduction code below).
  2. Call page.extract() with an instruction and a Zod schema.
  3. Inspect the request body sent to the custom OpenAI endpoint.
  4. Notice that the json_schema.schema field is missing the properties defined in the Zod object.

Minimal Reproduction Code

// Your minimal reproduction code here
import { Stagehand } from '@browserbase/stagehand';
import { z as z3 } from 'zod/v3';
import OpenAI from 'openai';

// NOTE: CustomOpenAIClient is not exported, as mentioned in #1043
// This is a simplified version for reproduction.
class CustomOpenAIClient { 
    constructor(config) { this.config = config; }
    // ... implementation details
}

const stagehand = new Stagehand({
  llmClient: new CustomOpenAIClient({
		modelName: "custom_openai_model",
		client: new OpenAI({
			apiKey: "not-needed-for-local-endpoint",
			baseURL: "http://localhost:1235/v1",
		}),
	}),
});

// Steps that reproduce the issue
const result_job_extract = await page.extract({
    instruction: `
      user content here ...
    `,
    schema: z3.object({
      list_of_apartments: z3.array(
        z3.object({
          address: z3.string().describe("the address of the apartment"),
          price: z3.string().describe("the price of the apartment"),
          square_feet: z3.string().describe("the square footage of the apartment"),
        }),
      ),
    })
  })

Error Messages / Log trace

Here is the problematic request body sent to the LLM. The json_schema.schema field is missing the properties from the Zod schema.

{
  "messages": [
    {
      "role": "user",
      "content": "You are extracting content on behalf of a user. If a user asks you to extract a 'list' of information, or 'all' information, YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS. You will be given: 1. An instruction 2. A list of DOM elements to extract from. Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. If a user is attempting to extract links or URLs, you MUST respond with ONLY the IDs of the link elements. Do not attempt to extract links directly from the text unless absolutely necessary. "
    },
    {
      "role": "user",
      "content": "user content here ..."
    }
  ],
  "temperature": 0.1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "model": "custom_openai_model",
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "Extraction",
      "strict": true,
      "schema": {
        "$schema": "http://json-schema.org/draft-07/schema#"
      }
    }
  },
  "stream": false
}

Screenshots / Videos

N/A

Related Issues

Are there any related issues or PRs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions