You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Diversified sampler
The diversified_sampler
aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same field
. It does so by using the max_docs_per_value
and field
settings, which limit the maximum number of documents collected on a shard for the provided field
. The max_docs_per_value
setting is an optional parameter used to determine the maximum number of documents that will be returned per field
. The default value of this setting is 1
.
Similarly to the sampler
aggregation, you can use the shard_size
setting to control the maximum number of documents collected on any one shard, as shown in the following example:
GET opensearch_dashboards_sample_data_logs/_search
{
"size": 0,
"aggs": {
"sample": {
"diversified_": {
"shard_size": 1000,
"field": "response.keyword"
},
"aggs": {
"terms": {
"terms": {
"field": "agent.keyword"
}
}
}
}
}
}
Example response
...
"aggregations" : {
"sample" : {
"doc_count" : 3,
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
"doc_count" : 2
},
{
"key" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
"doc_count" : 1
}
]
}
}
}
}