-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Background
retriever.py
currently:
- Hard-codes the semantic configuration name, top-10 limit, and index fields
- Falls back to mock documents instead of retrying or surfacing detailed errors
- Does not pass retrieved passages to an LLM for answer synthesis – the kernel only echoes a count
- Ties the embedding model to server-side reranking, even though GPT-4.1-nano is now available for fast, low-cost generation
Moving to a configurable retrieval → generation flow will let us:
- Tune query parameters per environment (dev, staging, prod) without code changes
- Drop the mock-doc path in favour of proper retries and telemetry
- Use GPT-4.1-nano to draft quick RAG answers while keeping GPT-4o or 4-turbo for high-quality fallbacks
- Prepare the ground for multi-vector hybrid search in a future milestone
Scope of Work
-
Configuration
-
Expose the following via
settings
+.env
:SEARCH_INDEX
SEMANTIC_CONFIGURATION
SEARCH_TOP_K
(default 10)GPT_FAST_MODEL
(defaultgpt-4.1-nano
)GPT_FALLBACK_MODEL
(defaultgpt-4o
)
-
-
Retrieval logic
- Replace
self.search_client.search(...)
with a helper that maps env values → SDK params. - Add exponential-backoff retry (3 attempts, 1-4-8 s) on
ServiceRequestError
,HttpResponseError
, andClientAuthenticationError
. - Return a typed
RetrievalResult
dataclass containing:id
,content
,title
,source
,score
,reranker_score
.
- Replace
-
Generation logic
-
After fetching docs, send the top N passages (env var
RAG_PASSAGES
, default 4) to GPT-4.1-nano with a dry prompt:You are a retrieval assistant. Provide a concise answer to the user based only on the passages provided. Use bullet points when listing items.
-
If the nano model times out or returns an empty response, retry with
GPT_FALLBACK_MODEL
. -
Attach source-attribution footnotes:
[1]
,[2]
, … mapping tosource
.
-
-
Streaming updates
- In
invoke_stream
, yield interim tokens from the LLM call so the UI can display gradual output.
- In
-
Tests
-
Unit test for config overrides via
pytest-env
. -
Integration test that ensures:
- At least one passage is returned for a known query (
"Microsoft revenue"
). - GPT response contains a footnote matching one of the passage sources.
- At least one passage is returned for a known query (
-
Acceptance Criteria
- All hard-coded values removed from
RetrieverAgent
. - Queries, semantic config, and top-K are driven by env vars.
- GPT-4.1-nano is invoked first; fallback engages only on error or empty content.
- End-to-end latency (retrieval + generation) averages <1.2 s for a 30-token query against the dev index.
- Tests pass in CI and the README section “Fast RAG path” is updated.
Additional Notes
- When
use_agentic_retrieval
is true, preferKnowledgeAgentRetrievalClient
but still respect env overrides. - Consider moving mock-document logic to a separate debug utility instead of deleting outright.
- Follow the project style guide: hyphens, not em dashes, in log or user-visible messages.
⌛ Effort: Medium
Metadata
Metadata
Assignees
Labels
No labels