Link Search Menu Expand Document Documentation Menu

Auto-interval date histogram

Similar to the date histogram aggregation, in which you must specify an interval, the auto_date_histogram is a multi-bucket aggregation that automatically creates date histogram buckets based on the number of buckets you provide and the time range of your data. The actual number of buckets returned is always less than or equal to the number of buckets you specify. This aggregation is particularly useful when you are working with time-series data and want to visualize or analyze data over different time intervals without manually specifying the interval size.

Intervals

The bucket interval is chosen based on the collected data to ensure that the number of returned buckets is less than or equal to the requested number.

The following table lists the possible returned intervals for each time unit.

Unit Intervals
Seconds Multiples of 1, 5, 10, and 30
Minutes Multiples of 1, 5, 10, and 30
Hours Multiples of 1, 3, and 12
Days Multiples of 1 and 7
Months Multiples of 1 and 3
Years Multiples of 1, 5, 10, 20, 50, and 100

If an aggregation returns too many buckets (for example, daily buckets), OpenSearch will automatically reduce the number of buckets to ensure a manageable result. Instead of returning the exact number of requested daily buckets, it will reduce them by a factor of about 1/7. For example, if you ask for 70 buckets but the data contains too many daily intervals, OpenSearch might return only 10 buckets, grouping the data into larger intervals (such as weeks) to avoid an overwhelming number of results. This helps optimize the aggregation and prevent excessive detail when too much data is available.

Example

In the following example, you’ll search an index containing blog posts.

First, create a mapping for this index and specify the date_posted field as the date type:

PUT blogs
{
  "mappings" : {
    "properties" :  {
      "date_posted" : {
        "type" : "date",
        "format" : "yyyy-MM-dd"
      }
    }
  }
}

Next, index the following documents into the blogs index:

PUT blogs/_doc/1
{
  "name": "Semantic search in OpenSearch",
  "date_posted": "2022-04-17"
}

PUT blogs/_doc/2
{
  "name": "Sparse search in OpenSearch",
  "date_posted": "2022-05-02"
}

PUT blogs/_doc/3
{
  "name": "Distributed tracing with Data Prepper",
  "date_posted": "2022-04-25"
}

PUT blogs/_doc/4
{
  "name": "Observability in OpenSearch",
  "date_posted": "2023-03-23"
}

To use the auto_date_histogram aggregation, specify the field containing the date or timestamp values. For example, to aggregate blog posts by date_posted into two buckets, send the following request:

GET /blogs/_search
{
  "size": 0,
  "aggs": {
    "histogram": {
      "auto_date_histogram": {
        "field": "date_posted",
        "buckets": 2
      }
    }
  }
}

The response shows that the blog posts were aggregated into two buckets. The interval was automatically set to 1 year, with all three 2022 blog posts collected in one bucket and the 2023 blog post in another:

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "histogram": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2023-01-01",
          "key": 1672531200000,
          "doc_count": 1
        }
      ],
      "interval": "1y"
    }
  }
}

Returned buckets

Each bucket contains the following information:

{
  "key_as_string": "2023-01-01",
  "key": 1672531200000,
  "doc_count": 1
}

In OpenSearch, dates are internally stored as 64-bit integers representing timestamps in milliseconds since the epoch. In the aggregation response, each bucket key is returned as such a timestamp. The key_as_string value shows the same timestamp but formatted as a date string based on the format parameter. The doc_count field contains the number of documents in the bucket.

Parameters

Auto-interval date histogram aggregations accept the following parameters.

Parameter Data type Description
field String The field on which to aggregate. The field must contain the date or timestamp values. Either field or script is required.
buckets Integer The desired number of buckets. The returned number of buckets is less than or equal to the desired number. Optional. Default is 10.
minimum_interval String The minimum interval to be used. Specifying a minimum interval can make the aggregation process more efficient. Valid values are year, month, day, hour, minute, and second. Optional.
time_zone String Specifies to use a time zone other than the default (UTC) for bucketing and rounding. You can specify the time_zone parameter as a UTC offset, such as -04:00, or an IANA time zone ID, such as America/New_York. Optional. Default is UTC. For more information, see Time zone.
format String The format for returning dates representing bucket keys. Optional. Default is the format specified in the field mapping. For more information, see Date format.
script String A document-level or value-level script for aggregating values into buckets. Either field or script is required.
missing String Specifies how to handle documents in which the field value is missing. By default, such documents are ignored. If you specify a date value in the missing parameter, all documents in which the field value is missing are collected into the bucket with the specified date.

Date format

If you don’t specify the format parameter, the format defined in the field mapping is used (as seen in the preceding response). To modify the format, specify the format parameter:

GET /blogs/_search
{
  "size": 0,
  "aggs": {
    "histogram": {
      "auto_date_histogram": {
        "field": "date_posted",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

The key_as_string field is now returned in the specified format:

{
  "key_as_string": "2023-01-01 00:00:00",
  "key": 1672531200000,
  "doc_count": 1
}

Alternatively, you can specify one of the built-in date formats:

GET /blogs/_search
{
  "size": 0,
  "aggs": {
    "histogram": {
      "auto_date_histogram": {
        "field": "date_posted",
        "format": "basic_date_time_no_millis"
      }
    }
  }
}

The key_as_string field is now returned in the specified format:

{
  "key_as_string": "20230101T000000Z",
  "key": 1672531200000,
  "doc_count": 1
}

Time zone

By default, dates are stored and processed in UTC. The time_zone parameter allows you to specify a different time zone for bucketing. You can specify the time_zone parameter as a UTC offset, such as -04:00, or an IANA time zone ID, such as America/New_York.

As an example, index the following documents into an index:

PUT blogs1/_doc/1
{
  "name": "Semantic search in OpenSearch",
  "date_posted": "2022-04-17T01:00:00.000Z"
}

PUT blogs1/_doc/2
{
  "name": "Sparse search in OpenSearch",
  "date_posted": "2022-04-17T04:00:00.000Z"
}

First, run an aggregation without specifying a time zone:

GET /blogs1/_search
{
  "size": 0,
  "aggs": {
    "histogram": {
      "auto_date_histogram": {
        "field": "date_posted",
        "buckets": 2,
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

The response contains two 3-hour buckets, starting at midnight UTC on April 17, 2022:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "histogram": {
      "buckets": [
        {
          "key_as_string": "2022-04-17 01:00:00",
          "key": 1650157200000,
          "doc_count": 1
        },
        {
          "key_as_string": "2022-04-17 04:00:00",
          "key": 1650168000000,
          "doc_count": 1
        }
      ],
      "interval": "3h"
    }
  }
}

Now, specify a time_zone of -02:00:

GET /blogs1/_search
{
  "size": 0,
  "aggs": {
    "histogram": {
      "auto_date_histogram": {
        "field": "date_posted",
        "buckets": 2,
        "format": "yyyy-MM-dd HH:mm:ss",
        "time_zone": "-02:00"
      }
    }
  }
}

The response contains two buckets in which the start time is shifted by 2 hours and starts at 23:00 on April 16, 2022:

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "histogram": {
      "buckets": [
        {
          "key_as_string": "2022-04-16 23:00:00",
          "key": 1650157200000,
          "doc_count": 1
        },
        {
          "key_as_string": "2022-04-17 02:00:00",
          "key": 1650168000000,
          "doc_count": 1
        }
      ],
      "interval": "3h"
    }
  }
}

When using time zones with daylight saving time (DST) changes, the sizes of buckets that are near the transition may differ slightly from the sizes of neighboring buckets.

350 characters left

Have a question? .

Want to contribute? or .