Star-tree field type
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.
A star-tree index precomputes aggregations, accelerating the performance of aggregation queries. If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.
OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.
For more information, see Star-tree index.
Prerequisites
To use a star-tree index, follow the instructions in Enabling a star-tree index.
Examples
The following examples show how to use a star-tree index.
Star-tree index mappings
Define star-tree index mappings in the composite
section in mappings
.
The following example API request creates a corresponding star-tree index namedrequest_aggs
. To compute metric aggregations for request_size
and latency
fields with queries on port
and status
fields, configure the following mappings:
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
Star-tree index configuration options
You can customize your star-tree implementation using the following config
options in the mappings
section. These options cannot be modified without reindexing.
Parameter | Description |
---|---|
ordered_dimensions | A list of fields based on which metrics will be aggregated in a star-tree index. Required. |
metrics | A list of metric fields required in order to perform aggregations. Required. |
max_leaf_docs | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, child nodes will be created based on the unique value of the next field in the ordered_dimension (if any). Default is 10000 . A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see Star-tree indexing structure. |
skip_star_node_creation_for_dimensions | A list of dimensions for which a star-tree index will skip star node creation. When true , this reduces storage size at the expense of query performance. Default is false . For more information about star nodes, see Star-tree indexing structure. |
Ordered dimensions
The ordered_dimensions
parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the ordered_dimensions
.
When using the ordered_dimesions
parameter, follow these best practices:
- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
- Currently, fields supported by the
ordered_dimensions
parameter are all numeric field types, with the exception ofunsigned_long
. For more information, see GitHub issue #15231. - Support for other field types, such as
keyword
andip
, will be added in future versions. For more information, see GitHub issue #16232. - A minimum of
2
and a maximum of10
dimensions are supported per star-tree index.
The ordered_dimensions
parameter supports the following property.
Parameter | Required/Optional | Description |
---|---|---|
name | Required | The name of the field. The field name should be present in the properties section as part of the index mapping . Ensure that the doc_values setting is enabled for any associated fields. |
Metrics
Configure any metric fields on which you need to perform aggregations. Metrics
are required as part of a star-tree index configuration.
When using metrics
, follow these best practices:
- Currently, fields supported by
metrics
are all numeric field types, with the exception ofunsigned_long
. For more information, see GitHub issue #15231. - Supported metric aggregations include
Min
,Max
,Sum
,Avg
, andValue_count
.Avg
is a derived metric based onSum
andValue_count
and is not indexed when a query is run. The remaining base metrics are indexed.
- A maximum of
100
base metrics are supported per star-tree index.
If Min
, Max
, Sum
, and Value_count
are defined as metrics
for each field, then up to 25 such fields can be configured, as shown in the following example:
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
Properties
The metrics
parameter supports the following properties.
Parameter | Required/Optional | Description |
---|---|---|
name | Required | The name of the field. The field name should be present in the properties section as part of the index mapping . Ensure that the doc_values setting is enabled for any associated fields. |
stats | Optional | A list of metric aggregations computed for each field. You can choose between Min , Max , Sum , Avg , and Value Count .Default is Sum and Value_count .Avg is a derived metric statistic that will automatically be supported in queries if Sum and Value_Count are present as part of metric stats . |
Supported queries and aggregations
For more information about supported queries and aggregations, see Supported queries and aggregations for a star-tree index.