Link Search Menu Expand Document Documentation Menu

You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Terms query

Use the terms query to search for multiple terms in the same field. For example, the following query searches for lines with the IDs 61809 and 61810:

GET shakespeare/_search
{
  "query": {
    "terms": {
      "line_id": [
        "61809",
        "61810"
      ]
    }
  }
}

A document is returned if it matches any of the terms in the array.

By default, the maximum number of terms allowed in a terms query is 65,536. To change the maximum number of terms, update the index.max_terms_count setting.

The ability to highlight results for terms queries may not be guaranteed, depending on the highlighter type and the number of terms in the query.

Parameters

The query accepts the following parameters. All parameters are optional.

Parameter Data type Description
<field> String The field in which to search. A document is returned in the results only if its field value exactly matches at least one term, with the correct spacing and capitalization.
boost Floating-point A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
value_type String Specifies the types of values used for filtering. Valid values are default and bitmap. If omitted, the value defaults to default.

Terms lookup

Terms lookup retrieves the field values of a single document and uses them as search terms. You can use terms lookup to search for a large number of terms.

To use terms lookup, you must enable the _source mapping field because terms lookup fetches values from a document. The _source field is enabled by default.

Terms lookup tries to fetch the document field values from a shard on a local data node. Thus, using an index with a single primary shard that has full replicas on all applicable data nodes reduces network traffic.

Example

As an example, create an index that contains student data, mapping student_id as a keyword:

PUT students
{
  "mappings": {
    "properties": {
      "student_id": { "type": "keyword" }
    }
  }
}

Next, index three documents that correspond to students:

PUT students/_doc/1
{
  "name": "Jane Doe",
  "student_id" : "111"
}

PUT students/_doc/2
{
  "name": "Mary Major",
  "student_id" : "222"
}

PUT students/_doc/3
{
  "name": "John Doe",
  "student_id" : "333"
}

Create a separate index that contains class information, including the class name and an array of student IDs corresponding to the students enrolled in the class:

PUT classes/_doc/101
{
  "name": "CS101",
  "enrolled" : ["111" , "222"]
}

To search for students enrolled in the CS101 class, specify the document ID of the document that corresponds to the class, the index of that document, and the path of the field in which the terms are located:

GET students/_search
{
  "query": {
    "terms": {
      "student_id": {
        "index": "classes",
        "id": "101",
        "path": "enrolled"
      }
    }
  }
}

The response contains the documents in the students index for every student whose ID matches one of the values in the enrolled array:

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "Jane Doe",
          "student_id": "111"
        }
      },
      {
        "_index": "students",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "Mary Major",
          "student_id": "222"
        }
      }
    ]
  }
}

Example: Nested fields

The second example demonstrates querying nested fields. Consider an index with the following document:

PUT classes/_doc/102
{
  "name": "CS102",
  "enrolled_students" : {
    "id_list" : ["111" , "333"]
  }
}

To search for students enrolled in CS102, use the dot path notation to specify the full path to the field in the path parameter:

ET students/_search
{
  "query": {
    "terms": {
      "student_id": {
        "index": "classes",
        "id": "102",
        "path": "enrolled_students.id_list"
      }
    }
  }
}

The response contains the matching documents:

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "students",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "Jane Doe",
          "student_id": "111"
        }
      },
      {
        "_index": "students",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "John Doe",
          "student_id": "333"
        }
      }
    ]
  }
}

Parameters

The following table lists the terms lookup parameters.

Parameter Data type Description
index String The name of the index in which to fetch field values. Required.
id String The document ID of the document from which to fetch field values. Required.
path String The name of the field from which to fetch field values. Specify nested fields using dot path notation. Required.
routing String Custom routing value of the document from which to fetch field values. Optional. Required if a custom routing value was provided when the document was indexed.
boost Floating-point A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.

Bitmap filtering

Introduced 2.17

The terms query can filter for multiple terms simultaneously. However, when the number of terms in the input filter increases to a large value (around 10,000), the resulting network and memory overhead can become significant, making the query inefficient. In such cases, consider encoding your large terms filter using a roaring bitmap for more efficient filtering.

The following example assumes that you have two indexes: a products index, which contains all the products sold by a company, and a customers index, which stores filters representing customers who own specific products.

First, create a products index and map product_id as a keyword:

PUT /products
{
  "mappings": {
    "properties": {
      "product_id": { "type": "keyword" }
    }
  }
}

Next, index three documents that correspond to products:

PUT students/_doc/1
{
  "name": "Product 1",
  "product_id" : "111"
}

PUT students/_doc/2
{
  "name": "Product 2",
  "product_id" : "222"
}

PUT students/_doc/3
{
  "name": "Product 3",
  "product_id" : "333"
}

To store customer bitmap filters, you’ll create a customer_filter binary field in the customers index. Specify store as true to store the field:

PUT /customers
{
  "mappings": {
    "properties": {
      "customer_filter": {
        "type": "binary",
        "store": true
      }
    }
  }
}

For each customer, you need to generate a bitmap that represents the product IDs of the products the customer owns. This bitmap effectively encodes the filter criteria for that customer. In this example, you’ll create a terms filter for a customer whose ID is customer123 and who owns products 111, 222, and 333.

To encode a terms filter for the customer, first create a roaring bitmap for the filter. This example creates a bitmap using the [PyRoaringBitMap] library, so first run pip install pyroaring to install the library. Then serialize the bitmap and encode it using a Base64 encoding scheme:

from pyroaring import BitMap
import base64

# Create a bitmap, serialize it into a byte string, and encode into Base64
bm = BitMap([111, 222, 333]) # product ids owned by a customer
encoded = base64.b64encode(BitMap.serialize(bm))

# Convert the Base64-encoded bytes to a string for storage or transmission
encoded_bm_str = encoded.decode('utf-8')

# Print the encoded bitmap
print(f"Encoded Bitmap: {encoded_bm_str}")

Next, index the customer filter into the customers index. The document ID for the filter is the same as the ID for the corresponding customer (in this example, customer123). The customer_filter field contains the bitmap you generated for this customer:

POST customers/_doc/customer123
{
  "customer_filter": "OjAAAAEAAAAAAAIAEAAAAG8A3gBNAQ==" 
}

Now you can run a terms query on the products index to look up a specific customer in the customers index. Because you’re looking up a stored field instead of _source, set store to true. In the value_type field, specify the data type of the terms input as bitmap:

POST /products/_search
{
  "query": {
    "terms": {
      "product_id": {
        "index": "customers",
        "id": "customer123",
        "path": "customer_filter",
        "store": true               
      },
      "value_type": "bitmap"      
    }
  }
}

You can also directly pass the bitmap to the terms query. In this example, the product_id field contains the customer filter bitmap for the customer whose ID is customer123:

POST /products/_search
{
  "query": {
    "terms": {
      "product_id": "OjAAAAEAAAAAAAIAEAAAAG8A3gBNAQ==",
      "value_type": "bitmap"
    }
  }
}

350 characters left

Have a question? .

Want to contribute? or .