Link Search Menu Expand Document Documentation Menu

Cardinality aggregations

The cardinality aggregation is a single-value metric aggregation that counts the number of unique or distinct values of a field.

Cardinality count is approximate. See Controlling precision for more information.

Parameters

The cardinality aggregation takes the following parameters.

Parameter Required/Optional Data type Description
field Required String The field for which the cardinality is estimated.
precision_threshold Optional Numeric The threshold below which counts are expected to be close to accurate. See Controlling precision for more information.
execution_hint Optional String How to run the aggregation. Valid values are ordinals and direct.
missing Optional Same as field’s type The bucket used to store missing instances of the field. If not provided, missing values are ignored.

Example

The following example request finds the number of unique product IDs in the OpenSearch Dashboards sample e-commerce data:

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "unique_products": {
      "cardinality": {
        "field": "products.product_id"
      }
    }
  }
}

Example response

As shown in the following example response, the aggregation returns the cardinality count in the unique_products variable:

{
  "took": 176,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4675,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "unique_products": {
      "value": 7033
    }
  }
}

Controlling precision

An accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn’t scale well; it can require huge amounts of memory and cause high latencies.

You can control the trade-off between memory and accuracy by using the precision_threshold setting. This parameter sets the threshold below which counts are expected to be close to accurate. Counts higher than this value may be less accurate.

The default value of precision_threshold is 3,000. The maximum supported value is 40,000.

The cardinality aggregation uses the HyperLogLog++ algorithm. Cardinality counts are typically very accurate up to the precision threshold and are within 6% of the true count in most other cases, even with a threshold of as low as 100.

Precomputing hashes

For high-cardinality string fields, storing hash values for the index field and computing the cardinality of the hash can save compute and memory resources. Use this approach with caution; it is more efficient only for sets with long strings and/or high cardinality. Numeric fields and less memory-consuming string sets are better processed directly.

Example: Controlling precision

Set the precision threshold to 10000 unique values:

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "unique_products": {
      "cardinality": {
        "field": "products.product_id",
        "precision_threshold": 10000
      }
    }
  }
}

The response is similar to the result with the default threshold, but the returned value is slightly different. Vary the precision_threshold parameter to see how it affects the cardinality estimate.

Configuring aggregation execution

You can control how an aggregation runs using the execution_hint setting. This setting supports two options:

  • direct – Uses field values directly.
  • ordinals – Uses ordinals of the field.

If you don’t specify execution_hint, OpenSearch automatically chooses the best option for the field.

Setting ordinals on a non-ordinal field has no effect. Similarly, direct has no effect on ordinal fields.

This is an expert-level setting. Ordinals use byte arrays, where the array size depends on the field’s cardinality. High-cardinality fields can consume significant heap memory, increasing the risk of out-of-memory errors.

Example: Controlling execution

The following request runs a cardinality aggregation using ordinals:

GET opensearch_dashboards_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "unique_products": {
      "cardinality": {
        "field": "products.product_id",
        "execution_hint": "ordinals"
      }
    }
  }
}

Missing values

You can assign a value to missing instances of the aggregated field. See Missing aggregations for more information.

Replacing missing values in a cardinality aggregation adds the replacement value to the list of unique values, increasing the actual cardinality by one.