Cardinality aggregations
The cardinality
aggregation is a single-value metric aggregation that counts the number of unique or distinct values of a field.
Cardinality count is approximate. See Controlling precision for more information.
Parameters
The cardinality
aggregation takes the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
field | Required | String | The field for which the cardinality is estimated. |
precision_threshold | Optional | Numeric | The threshold below which counts are expected to be close to accurate. See Controlling precision for more information. |
execution_hint | Optional | String | How to run the aggregation. Valid values are ordinals and direct . |
missing | Optional | Same as field ’s type | The bucket used to store missing instances of the field. If not provided, missing values are ignored. |
Example
The following example request finds the number of unique product IDs in the OpenSearch Dashboards sample e-commerce data:
GET opensearch_dashboards_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"unique_products": {
"cardinality": {
"field": "products.product_id"
}
}
}
}
Example response
As shown in the following example response, the aggregation returns the cardinality count in the unique_products
variable:
{
"took": 176,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4675,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"unique_products": {
"value": 7033
}
}
}
Controlling precision
An accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn’t scale well; it can require huge amounts of memory and cause high latencies.
You can control the trade-off between memory and accuracy by using the precision_threshold
setting. This parameter sets the threshold below which counts are expected to be close to accurate. Counts higher than this value may be less accurate.
The default value of precision_threshold
is 3,000. The maximum supported value is 40,000.
The cardinality aggregation uses the HyperLogLog++ algorithm. Cardinality counts are typically very accurate up to the precision threshold and are within 6% of the true count in most other cases, even with a threshold of as low as 100.
Precomputing hashes
For high-cardinality string fields, storing hash values for the index field and computing the cardinality of the hash can save compute and memory resources. Use this approach with caution; it is more efficient only for sets with long strings and/or high cardinality. Numeric fields and less memory-consuming string sets are better processed directly.
Example: Controlling precision
Set the precision threshold to 10000
unique values:
GET opensearch_dashboards_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"unique_products": {
"cardinality": {
"field": "products.product_id",
"precision_threshold": 10000
}
}
}
}
The response is similar to the result with the default threshold, but the returned value is slightly different. Vary the precision_threshold
parameter to see how it affects the cardinality estimate.
Configuring aggregation execution
You can control how an aggregation runs using the execution_hint
setting. This setting supports two options:
direct
– Uses field values directly.ordinals
– Uses ordinals of the field.
If you don’t specify execution_hint
, OpenSearch automatically chooses the best option for the field.
Setting ordinals
on a non-ordinal field has no effect. Similarly, direct
has no effect on ordinal fields.
This is an expert-level setting. Ordinals use byte arrays, where the array size depends on the field’s cardinality. High-cardinality fields can consume significant heap memory, increasing the risk of out-of-memory errors.
Example: Controlling execution
The following request runs a cardinality aggregation using ordinals:
GET opensearch_dashboards_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"unique_products": {
"cardinality": {
"field": "products.product_id",
"execution_hint": "ordinals"
}
}
}
}
Missing values
You can assign a value to missing instances of the aggregated field. See Missing aggregations for more information.
Replacing missing values in a cardinality aggregation adds the replacement value to the list of unique values, increasing the actual cardinality by one.