You're viewing version 2.16 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Match query
Use the match
query for full-text search on a specific document field. If you run a match
query on a text
field, the match
query analyzes the provided search string and returns documents that match any of the string’s terms. If you run a match
query on an exact-value field, it returns documents that match the exact value. The preferred way to search exact-value fields is to use a filter because, unlike a query, a filter is cached.
The following example shows a basic match
query for the word wind
in the title
:
GET _search
{
"query": {
"match": {
"title": "wind"
}
}
}
To pass additional parameters, you can use the expanded syntax:
GET _search
{
"query": {
"match": {
"title": {
"query": "wind",
"analyzer": "stop"
}
}
}
}
Examples
In the following examples, you’ll use the index that contains the following documents:
PUT testindex/_doc/1
{
"title": "Let the wind rise"
}
PUT testindex/_doc/2
{
"title": "Gone with the wind"
}
PUT testindex/_doc/3
{
"title": "Rise is gone"
}
Operator
If a match
query is run on a text
field, the text is analyzed with the analyzer specified in the analyzer
parameter. Then the resulting tokens are combined into a Boolean query using the operator specified in the operator
parameter. The default operator is OR
, so the query wind rise
is changed into wind OR rise
. In this example, this query returns documents 1–3 because each document has a term that matches the query. To specify the and
operator, use the following query:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wind rise",
"operator": "and"
}
}
}
}
The query is constructed as wind AND rise
and returns document 1 as the matching document:
Response
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
Minimum should match
You can control the minimum number of terms that a document must match to be returned in the results by specifying the minimum_should_match
parameter:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wind rise",
"operator": "or",
"minimum_should_match": 2
}
}
}
}
Now documents are required to match both terms, so only document 1 is returned (this is equivalent to the and
operator):
Response
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
Analyzer
Because in this example you didn’t explicitly specify the analyzer, the default standard
analyzer is used. The default analyzer does not perform stemming, so if you run a query the wind rises
, you receive no results because the token rises
does not match the token rise
. To change the search analyzer, specify it in the analyzer
field. For example, the following query uses the english
analyzer:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "the wind rises",
"operator": "and",
"analyzer": "english"
}
}
}
}
The english
analyzer removes the stopword the
and performs stemming, producing the tokens wind
and rise
. The latter token matches document 1, which is returned in the results:
Response
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
Empty query
In some cases, an analyzer might remove all tokens from a query. For example, the english
analyzer removes stop words, so in a query and OR or
, all tokens are removed. To check the analyzer behavior, you can use the Analyze API:
GET testindex/_analyze
{
"analyzer" : "english",
"text" : "and OR or"
}
As expected, the query produces no tokens:
{
"tokens": []
}
You can specify the behavior for an empty query in the zero_terms_query
parameter. Setting zero_terms_query
to all
returns all documents in the index and setting it to none
returns no documents:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "and OR or",
"analyzer" : "english",
"zero_terms_query": "all"
}
}
}
}
Fuzziness
To account for typos, you can specify fuzziness
for your query as either of the following:
- An integer that specifies the maximum allowed Damerau–Levenshtein distance for this edit.
AUTO
:- Strings of 0–2 characters must match exactly.
- Strings of 3–5 characters allow 1 edit.
- Strings longer than 5 characters allow 2 edits.
Setting fuzziness
to the default AUTO
value works best in most cases:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO"
}
}
}
}
The token wnid
matches wind
and the query returns documents 1 and 2:
Response
{
"took": 31,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.47501624,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.47501624,
"_source": {
"title": "Let the wind rise"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 0.47501624,
"_source": {
"title": "Gone with the wind"
}
}
]
}
}
Prefix length
Misspellings rarely occur in the beginning of words. Thus, you can specify the minimum length the matched prefix must be to return a document in the results. For example, you can change the preceding query to include a prefix_length
:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
}
The preceding query returns no results. If you change the prefix_length
to 1, documents 1 and 2 are returned because the first letter of the token wnid
is not misspelled.
Transpositions
In the preceding example, the word wnid
contained a transposition (in
was changed to ni
). By default, transpositions are allowed in fuzzy matching, but you can disallow them by setting fuzzy_transpositions
to false
:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO",
"fuzzy_transpositions": false
}
}
}
}
Now the query returns no results.
Synonyms
If you use a synonym_graph
filter and auto_generate_synonyms_phrase_query
is set to true
(default), OpenSearch parses the query into terms and then combines the terms to generate a phrase query for multi-term synonyms. For example, if you specify ba,batting average
as synonyms and search for ba
, OpenSearch searches for ba OR "batting average"
.
To match multi-term synonyms with conjunctions, set auto_generate_synonyms_phrase_query
to false
:
GET /testindex/_search
{
"query": {
"match": {
"text": {
"query": "good ba",
"auto_generate_synonyms_phrase_query": false
}
}
}
}
The query produced is ba OR (batting AND average)
.
Parameters
The query accepts the name of the field (<field>
) as a top-level parameter:
GET _search
{
"query": {
"match": {
"<field>": {
"query": "text to search for",
...
}
}
}
}
The <field>
accepts the following parameters. All parameters except query
are optional.
Parameter | Data type | Description |
---|---|---|
query | String | The query string to use for search. Required. |
auto_generate_synonyms_phrase_query | Boolean | Specifies whether to create a match phrase query automatically for multi-term synonyms. For example, if you specify ba,batting average as synonyms and search for ba , OpenSearch searches for ba OR "batting average" (if this option is true ) or ba OR (batting AND average) (if this option is false ). Default is true . |
analyzer | String | The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the default_field . If no analyzer is specified for the default_field , the analyzer is the default analyzer for the index. |
boost | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is 1 . |
enable_position_increments | Boolean | When true , resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted “gap” between terms. Default is true . |
fuzziness | String | The number of character edits (insertions, deletions, substitutions, or transpositions) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between wined and wind is 1. Valid values are non-negative integers or AUTO . The default, AUTO , chooses a value based on the length of each term and is a good choice for most use cases. |
fuzzy_rewrite | String | Determines how OpenSearch rewrites the query. Valid values are constant_score , scoring_boolean , constant_score_boolean , top_terms_N , top_terms_boost_N , and top_terms_blended_freqs_N . If the fuzziness parameter is not 0 , the query uses a fuzzy_rewrite method of top_terms_blended_freqs_${max_expansions} by default. Default is constant_score . |
fuzzy_transpositions | Boolean | Setting fuzzy_transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if fuzzy_transpositions is true (swap “n” and “i”) and 2 if it is false (delete “n”, insert “n”). If fuzzy_transpositions is false, rewind and wnid have the same distance (2) from wind , despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases. |
lenient | Boolean | Setting lenient to true ignores data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type float . Default is false . |
max_expansions | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in fuzziness . Then OpenSearch tries to match those terms. Default is 50 . |
minimum_should_match | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the or operator, the number of terms that need to match for the document to be considered a match. For example, if minimum_should_match is 2, wind often rising does not match The Wind Rises. If minimum_should_match is 1 , it matches. For details, see Minimum should match. |
operator | String | If the query string contains multiple search terms, whether all terms need to match (AND ) or only one term needs to match (OR ) for a document to be considered a match. Valid values are:- OR : The string to be is interpreted as to OR be - AND : The string to be is interpreted as to AND be Default is OR . |
prefix_length | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is 0 . |
zero_terms_query | String | In some cases, the analyzer removes all terms from a query string. For example, the stop analyzer removes all terms from the string an but this . In those cases, zero_terms_query specifies whether to match no documents (none ) or all documents (all ). Valid values are none and all . Default is none . |