Low Document-Level Hit Rate in `local_search` #1981

Furi5 · 2025-06-20T09:25:19Z

Furi5
Jun 20, 2025

I have built a document-level retrieval benchmark based on a corpus of over 1,000 scientific papers. The goal is to assess if local_search can retrieve the correct source document for a given factual question.

My evaluation shows a very low document-level Hit Rate, starting at 3.44% for @k=1 and reaching only 21.63% at @k=10. This suggests that for tasks requiring precise document provenance, the default local_search configuration may not be optimal.

My Methodology

Corpus: A collection of 1,000+ scientific papers in a specific domain [e.g., pharmacology].
Benchmark Construction: I created a set of question-document pairs. Each question is designed to have its answer contained within one specific "target document" in the corpus.
GraphRAG Indexing: I indexed the entire corpus using the graphrag --init command with the --init-method fast setting.
Retrieval Step: For each question in my benchmark, I run the local_search function.
Evaluation: I inspect the result.context_data["sources"] field from the local_search output. A "hit" is counted if the ID of the "target document" is present in this list of sources. The Hit Rate @k is the percentage of questions for which the target document is found within the top k sources.

Results

Here is the Hit Rate table from my benchmark:

Hit Rate @1	Hit Rate @2	Hit Rate @3	Hit Rate @4	Hit Rate @5	Hit Rate @6	Hit Rate @7	Hit Rate @8	Hit Rate @9	Hit Rate @10
3.44%	8.33%	11.20%	13.11%	15.79%	17.03%	19.14%	20.38%	21.34%	21.63%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low Document-Level Hit Rate in `local_search` #1981

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Low Document-Level Hit Rate in local_search #1981

Uh oh!

Furi5 Jun 20, 2025

My Methodology

Results

Replies: 0 comments

Low Document-Level Hit Rate in `local_search` #1981

Furi5
Jun 20, 2025