-
Notifications
You must be signed in to change notification settings - Fork 45
Home
This parser supports many of the constructs contained in the Lucene Query Syntax.
- conjunction operators:
AND
,OR
,||
,&&
,NOT
- prefix operators:
+
,-
- quoted values:
"foo bar"
- named fields:
foo:bar
- range expressions:
foo:[bar TO baz]
,foo:{bar TO baz}
- proximity search expressions:
"foo bar"~5
- boost expressions:
foo^5
,"foo bar"^5
- fuzzy search expressions:
foo~
,foo~0.5
- parentheses grouping:
(foo OR bar) AND baz
- field groups:
foo:(bar OR baz)
The parser returns an expression tree for the query in the form of a tree of expression nodes, which are each dictionaries.
There are three basic types of expression dictionaries; node, field and range expressions
A node expression generally has the following structure:
{
'left' : dictionary, // field expression or node
'operator': string, // operator value
'right': dictionary, // field expression OR node
'field': string // field name (for field group syntax) [OPTIONAL]
}
A field expression has the following structure:
{
'field': string, // field name
'term': string, // term value
'prefix': string // prefix operator (+/-) [OPTIONAL]
'boost': float // boost value, (value > 1 must be integer) [OPTIONAL]
'similarity': float // similarity value, (value must be > 0 and < 1) [OPTIONAL]
'proximity': integer // proximity value [OPTIONAL]
}
A range expression has the following structure:
{
'field': string, // field name
'term_min': string, // minimum value (left side) of range
'term_max': string, // maximum value (right side) of range
'inclusive': boolean // inclusive ([...]) or exclusive ({...})
}
For any field name, unnamed/default fields will have the value <implicit>
.
Wildcards (fo*
, f?o
) will be part of the term value.
Escaping Special Characters as described in the [Lucene Documentation](http://lucene.apache.org/java/2_9_4/queryparsersyntax.html#Escaping Special Characters) is not supported and generally speaking, will break the parser.
Conjunction operators that appear at the beginning of the query violate the logic of the syntax, and are currently "mostly" ignored. The last element will be returned.
For example:
Query: OR
Return: { "operator": "OR" }
Query: OR AND
Return: { "operator": "AND" }
Query: OR AND foo
Return: { "left": { "field": "<implicit>", "term": "foo" } }
To run the unit tests, just open SpecRunner.html in any browser. Unit tests are built with Jasmine.
The parser is auto-generated from a PEG implementation in Javascript called PEG.js using the grammar file lucene-query-parser.grammar.
To test the grammar without using the generated parser, or if you want to modify it, try out PEG.js online. This is a handy way to test an arbitrary query and see what the results will be like or debug a problem with the parser for a given piece of data.