Link Search Menu Expand Document Documentation Menu

You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Derived field type

Introduced 2.15

Derived fields allow you to create new fields dynamically by executing scripts on existing fields. The existing fields can be either retrieved from the _source field, which contains the original document, or from a field’s doc values. Once you define a derived field either in an index mapping or within a search request, you can use the field in a query in the same way you would use a regular field.

When to use derived fields

Derived fields offer flexibility in field manipulation and prioritize storage efficiency. However, because they are computed at query time, they can reduce query performance. Derived fields are particularly useful in scenarios requiring real-time data transformation, such as:

  • Log analysis: Extracting timestamps and log levels from log messages.
  • Performance metrics: Calculating response times from start and end timestamps.
  • Security analytics: Real-time IP geolocation and user-agent parsing for threat detection.
  • Experimental use cases: Testing new data transformations, creating temporary fields for A/B testing, or generating one-time reports without altering mappings or reindexing data.

Despite the potential performance impact of query-time computations, the flexibility and storage efficiency of derived fields make them a valuable tool for these applications.

Current limitations

Currently, derived fields have the following limitations:

  • Scoring and sorting: Not yet supported.
  • Aggregations: Starting with OpenSearch 2.17, derived fields support most aggregation types. The following aggregations are not supported: geographic (geodistance, geohash grid, geohex grid, geotile grid, geobounds, geocentroid), significant terms, significant text, and scripted metric.
  • Dashboard support: These fields are not displayed in the list of available fields in OpenSearch Dashboards. However, you can still use them for filtering if you know the derived field name.
  • Chained derived fields: One derived field cannot be used to define another derived field.
  • Join field type: Derived fields are not supported for the join field type.

We are planning to address these limitations in future versions.

Prerequisites

Before using a derived field, be sure to satisfy the following prerequisites:

  • Enable _source or doc_values: Ensure that either the _source field or doc values is enabled for the fields used in your script.
  • Enable expensive queries: Ensure that search.allow_expensive_queries is set to true.
  • Feature control: Derived fields are enabled by default. You can enable or disable derived fields by using the following settings:
    • Index level: Update the index.query.derived_field.enabled setting.
    • Cluster level: Update the search.derived_field.enabled setting. Both settings are dynamic, so they can be changed without reindexing or node restarts.
  • Performance considerations: Before using derived fields, evaluate the performance implications to ensure that derived fields meet your scale requirements.

Defining derived fields

You can define derived fields in index mappings or directly within a search request.

Example setup

To try the examples on this page, first create the following logs index:

PUT logs
{
  "mappings": {
    "properties": {
      "request": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "clientip": {
        "type": "keyword"
      }
    }
  }
}

Add sample documents to the index:

POST _bulk
{ "index" : { "_index" : "logs", "_id" : "1" } }
{ "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778", "clientip": "61.177.2.0" }
{ "index" : { "_index" : "logs", "_id" : "2" } }
{ "request": "894140400 GET /french/playing/mascot/mascot.html HTTP/1.1 200 5474", "clientip": "185.92.2.0" }
{ "index" : { "_index" : "logs", "_id" : "3" } }
{ "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711", "clientip": "61.177.2.0" }
{ "index" : { "_index" : "logs", "_id" : "4" } }
{ "request": "894360400 POST /images/home_fr_button.gif HTTP/1.1 200 2140", "clientip": "129.178.2.0" }
{ "index" : { "_index" : "logs", "_id" : "5" } }
{ "request": "894470400 DELETE /images/102384s.gif HTTP/1.0 200 785", "clientip": "227.177.2.0" }

Defining derived fields in index mappings

To derive the timestamp, method, and size fields from the request field indexed in the logs index, configure the following mappings:

PUT /logs/_mapping
{
  "derived": {
    "timestamp": {
      "type": "date",
      "format": "MM/dd/yyyy",
      "script": {
        "source": """
        emit(Long.parseLong(doc["request.keyword"].value.splitOnToken(" ")[0]))
        """
      }
    },
    "method": {
      "type": "keyword",
      "script": {
        "source": """
        emit(doc["request.keyword"].value.splitOnToken(" ")[1])
        """
      }
    },
    "size": {
      "type": "long",
      "script": {
        "source": """
        emit(Long.parseLong(doc["request.keyword"].value.splitOnToken(" ")[5]))
        """
      }
    }
  }
}

Note that the timestamp field has an additional format parameter that specifies the format in which to display date fields. If you don’t include a format parameter, then the format defaults to strict_date_time_no_millis. For more information about supported date formats, see Parameters.

Parameters

The following table lists the parameters accepted by derived field types. All parameters are dynamic and can be modified without reindexing documents.

Parameter Required/Optional Description
type Required The type of the derived field. Supported types are boolean, date, geo_point, ip, keyword, text, long, double, float, and object.
script Required The script associated with the derived field. Any value emitted from the script must be emitted using emit(). The type of the emitted value must match the type of the derived field. Scripts have access to both the doc_values and _source fields if those are enabled. The doc value of a field can be accessed using doc['field_name'].value, and the source can be accessed using params._source["field_name"].
format Optional The format used for parsing dates. Only applicable to date fields. Valid values are strict_date_time_no_millis, strict_date_optional_time, and epoch_millis. For more information, see Formats.
ignore_malformed Optional A Boolean value that specifies whether to ignore malformed values when running a query on a derived field. Default value is false (throw an exception when encountering malformed values).
prefilter_field Optional An indexed text field provided to boost the performance of derived fields. Specifies an existing indexed field on which to filter prior to filtering on the derived field. For more information, see Prefilter field.

Emitting values in scripts

The emit() function is available only within the derived field script context. It is used to emit one or multiple (for a multi-valued field) script values for a document on which the script runs.

The following table lists the emit() function formats for the supported field types.

Type Emit format Multi-valued fields supported
boolean emit(boolean) No
double emit(double) Yes
date emit(long timeInMilis) Yes
float emit(float) Yes
geo_point emit(double lat, double lon) Yes
ip emit(String ip) Yes
keyword emit(String) Yes
long emit(long) Yes
object emit(String json) (valid JSON) Yes
text emit(String) Yes

By default, a type mismatch between a derived field and its emitted value will result in the search request failing with an error. If ignore_malformed is set to true, then the failing document is skipped and the search request succeeds.

The size limit of the emitted values is 1 MB per document.

Searching derived fields defined in index mappings

To search derived fields, use the same syntax as when searching regular fields. For example, the following request searches for documents with derived timestamp field in the specified range:

POST /logs/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "1970-01-11T08:20:30.400Z",   
        "lte": "1970-01-11T08:26:00.400Z"
      }
    }
  },
  "fields": ["timestamp"]
}

The response contains the matching documents:

Response
{
  "took": 315,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778",
          "clientip": "61.177.2.0"
        },
        "fields": {
          "timestamp": [
            "1970-01-11T08:20:30.400Z"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "request": "894140400 GET /french/playing/mascot/mascot.html HTTP/1.1 200 5474",
          "clientip": "185.92.2.0"
        },
        "fields": {
          "timestamp": [
            "1970-01-11T08:22:20.400Z"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "3",
        "_score": 1,
        "_source": {
          "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711",
          "clientip": "61.177.2.0"
        },
        "fields": {
          "timestamp": [
            "1970-01-11T08:24:10.400Z"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "4",
        "_score": 1,
        "_source": {
          "request": "894360400 POST /images/home_fr_button.gif HTTP/1.1 200 2140",
          "clientip": "129.178.2.0"
        },
        "fields": {
          "timestamp": [
            "1970-01-11T08:26:00.400Z"
          ]
        }
      }
    ]
  }
}

Defining and searching derived fields in a search request

You can also define derived fields directly in a search request and query them along with regular indexed fields. For example, the following request creates the url and status derived fields and searches those fields along with the regular request and clientip fields:

POST /logs/_search
{
  "derived": {
    "url": {
      "type": "text",
      "script": {
        "source": """
        emit(doc["request"].value.splitOnToken(" ")[2])
        """
      }
    },
    "status": {
      "type": "keyword",
      "script": {
        "source": """
        emit(doc["request"].value.splitOnToken(" ")[4])
        """
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "clientip": "61.177.2.0"
          }
        },
        {
          "match": {
            "url": "images"
          }
        },
        {
          "term": {
            "status": "200"
          }
        }
      ]
    }
  },
  "fields": ["request", "clientip", "url", "status"]
}

The response contains the matching documents:

Response
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 2.8754687,
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "_score": 2.8754687,
        "_source": {
          "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778",
          "clientip": "61.177.2.0"
        },
        "fields": {
          "request": [
            "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778"
          ],
          "clientip": [
            "61.177.2.0"
          ],
          "url": [
            "/english/images/france98_venues.gif"
          ],
          "status": [
            "200"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "3",
        "_score": 2.8754687,
        "_source": {
          "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711",
          "clientip": "61.177.2.0"
        },
        "fields": {
          "request": [
            "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711"
          ],
          "clientip": [
            "61.177.2.0"
          ],
          "url": [
            "/english/venues/images/venue_header.gif"
          ],
          "status": [
            "200"
          ]
        }
      }
    ]
  }
}

Derived fields use the default analyzer specified in the index analysis settings during search. You can override the default analyzer or specify a search analyzer within a search request in the same way as with regular fields. For more information, see Analyzers.

When both an index mapping and a search definition are present for a field, the search definition takes precedence.

Retrieving fields

You can retrieve derived fields using the fields parameter in the search request in the same way as with regular fields, as shown in the preceding examples. You can also use wildcards to retrieve all derived fields that match a given pattern.

Highlighting

Derived fields of type text support highlighting using the unified highlighter. For example, the following request specifies to highlight the derived url field:

POST /logs/_search
{
  "derived": {
    "url": {
      "type": "text",
      "script": {
        "source": """
        emit(doc["request"].value.splitOnToken(" " )[2])
        """
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "clientip": "61.177.2.0"
          }
        },
        {
          "match": {
            "url": "images"
          }
        }
      ]
    }
  },
  "fields": ["request", "clientip", "url"],
  "highlight": {
    "fields": {
      "url": {}
    }
  }
}

The response specifies highlighting in the url field:

Response
{
  "took": 45,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.8754687,
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "_score": 1.8754687,
        "_source": {
          "request": "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778",
          "clientip": "61.177.2.0"
        },
        "fields": {
          "request": [
            "894030400 GET /english/images/france98_venues.gif HTTP/1.0 200 778"
          ],
          "clientip": [
            "61.177.2.0"
          ],
          "url": [
            "/english/images/france98_venues.gif"
          ]
        },
        "highlight": {
          "url": [
            "/english/<em>images</em>/france98_venues.gif"
          ]
        }
      },
      {
        "_index": "logs",
        "_id": "3",
        "_score": 1.8754687,
        "_source": {
          "request": "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711",
          "clientip": "61.177.2.0"
        },
        "fields": {
          "request": [
            "894250400 POST /english/venues/images/venue_header.gif HTTP/1.0 200 711"
          ],
          "clientip": [
            "61.177.2.0"
          ],
          "url": [
            "/english/venues/images/venue_header.gif"
          ]
        },
        "highlight": {
          "url": [
            "/english/venues/<em>images</em>/venue_header.gif"
          ]
        }
      }
    ]
  }
}

Aggregations

Starting with OpenSearch 2.17, derived fields support most aggregation types.

Geographic, significant terms, significant text, and scripted metric aggregations are not supported.

For example, the following request creates a simple terms aggregation on the method derived field:

POST /logs/_search
{
  "size": 0,
  "aggs": {
    "methods": {
      "terms": {
        "field": "method"
      }
    }
  }
}

The response contains the following buckets:

Response
{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "methods" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "GET",
          "doc_count" : 2
        },
        {
          "key" : "POST",
          "doc_count" : 2
        },
        {
          "key" : "DELETE",
          "doc_count" : 1
        }
      ]
    }
  }
}

Performance

Derived fields are not indexed but are computed dynamically by retrieving values from the _source field or doc values. Thus, they run more slowly. To improve performance, try the following:

  • Prune the search space by adding query filters on indexed fields in conjunction with derived fields.
  • Use doc values instead of _source in the script for faster access, whenever applicable.
  • Consider using a prefilter_field to automatically prune the search space without explicit filters in the search request.

Prefilter field

Specifying a prefilter field helps to prune the search space without adding explicit filters in the search request. The prefilter field specifies an existing indexed field (prefilter_field) on which to filter automatically when constructing the query. The prefilter_field must be a text field (either text or match_only_text).

For example, you can add a prefilter_field to the method derived field. Update the index mapping, specifying to prefilter on the request field:

PUT /logs/_mapping
{
  "derived": {
    "method": {
      "type": "keyword",
      "script": {
        "source": """
        emit(doc["request.keyword"].value.splitOnToken(" ")[1])
        """
      },
      "prefilter_field": "request"
    }
  }
}

Now search using a query on the method derived field:

POST /logs/_search
{
  "profile": true,
  "query": {
    "term": {
      "method": {
        "value": "GET"
      }
    }
  },
  "fields": ["method"]
}

OpenSearch automatically adds a filter on the request field to your query:

"#request:GET #DerivedFieldQuery (Query: [ method:GET])"

You can use the profile option to analyze derived field performance, as shown in the preceding example.

Derived object fields

A script can emit a valid JSON object so that you can query subfields without indexing them, in the same way as with regular fields. This is useful for large JSON objects that require occasional searches on some subfields. In this case, indexing the subfields is expensive, while defining derived fields for each subfield also adds a lot of resource overhead. If you don’t explicitly provide the subfield type, then the subfield type is inferred.

For example, the following request defines a derived_request_object derived field as an object type:

PUT logs_object
{
  "mappings": {
    "properties": {
      "request_object": { "type": "text" }
    },
    "derived": {
      "derived_request_object": {
        "type": "object",
        "script": {
          "source": "emit(params._source[\"request_object\"])"
        }
      }
    }
  }
}

Consider the following documents, in which the request_object is a string representation of a JSON object:

POST _bulk
{ "index" : { "_index" : "logs_object", "_id" : "1" } }
{ "request_object": "{\"@timestamp\": 894030400, \"clientip\":\"61.177.2.0\", \"request\": \"GET /english/venues/images/venue_header.gif HTTP/1.0\", \"status\": 200, \"size\": 711}" }
{ "index" : { "_index" : "logs_object", "_id" : "2" } }
{ "request_object": "{\"@timestamp\": 894140400, \"clientip\":\"129.178.2.0\", \"request\": \"GET /images/home_fr_button.gif HTTP/1.1\", \"status\": 200, \"size\": 2140}" }
{ "index" : { "_index" : "logs_object", "_id" : "3" } }
{ "request_object": "{\"@timestamp\": 894240400, \"clientip\":\"227.177.2.0\", \"request\": \"GET /images/102384s.gif HTTP/1.0\", \"status\": 400, \"size\": 785}" }
{ "index" : { "_index" : "logs_object", "_id" : "4" } }
{ "request_object": "{\"@timestamp\": 894340400, \"clientip\":\"61.177.2.0\", \"request\": \"GET /english/images/venue_bu_city_on.gif HTTP/1.0\", \"status\": 400, \"size\": 1397}\n" }
{ "index" : { "_index" : "logs_object", "_id" : "5" } }
{ "request_object": "{\"@timestamp\": 894440400, \"clientip\":\"132.176.2.0\", \"request\": \"GET /french/news/11354.htm HTTP/1.0\", \"status\": 200, \"size\": 3460, \"is_active\": true}" }

The following query searches the @timestamp subfield of the derived_request_object:

POST /logs_object/_search
{
  "query": {
    "range": {
      "derived_request_object.@timestamp": {
        "gte": "894030400",   
        "lte": "894140400"
      }
    }
  },
  "fields": ["derived_request_object.@timestamp"]
}

The response contains the matching documents:

Response
{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "logs_object",
        "_id": "1",
        "_score": 1,
        "_source": {
          "request_object": """{"@timestamp": 894030400, "clientip":"61.177.2.0", "request": "GET /english/venues/images/venue_header.gif HTTP/1.0", "status": 200, "size": 711}"""
        },
        "fields": {
          "derived_request_object.@timestamp": [
            894030400
          ]
        }
      },
      {
        "_index": "logs_object",
        "_id": "2",
        "_score": 1,
        "_source": {
          "request_object": """{"@timestamp": 894140400, "clientip":"129.178.2.0", "request": "GET /images/home_fr_button.gif HTTP/1.1", "status": 200, "size": 2140}"""
        },
        "fields": {
          "derived_request_object.@timestamp": [
            894140400
          ]
        }
      }
    ]
  }
}

You can also specify to highlight a derived object field:

POST /logs_object/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "derived_request_object.clientip": "61.177.2.0"
          }
        },
        {
          "match": {
            "derived_request_object.request": "images"
          }
        }
      ]
    }
  },
  "fields": ["derived_request_object.*"],
  "highlight": {
    "fields": {
      "derived_request_object.request": {}
    }
  }
}

The response adds highlighting to the derived_request_object.request field:

Response
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 2,
    "hits": [
      {
        "_index": "logs_object",
        "_id": "1",
        "_score": 2,
        "_source": {
          "request_object": """{"@timestamp": 894030400, "clientip":"61.177.2.0", "request": "GET /english/venues/images/venue_header.gif HTTP/1.0", "status": 200, "size": 711}"""
        },
        "fields": {
          "derived_request_object.request": [
            "GET /english/venues/images/venue_header.gif HTTP/1.0"
          ],
          "derived_request_object.clientip": [
            "61.177.2.0"
          ]
        },
        "highlight": {
          "derived_request_object.request": [
            "GET /english/venues/<em>images</em>/venue_header.gif HTTP/1.0"
          ]
        }
      },
      {
        "_index": "logs_object",
        "_id": "4",
        "_score": 2,
        "_source": {
          "request_object": """{"@timestamp": 894340400, "clientip":"61.177.2.0", "request": "GET /english/images/venue_bu_city_on.gif HTTP/1.0", "status": 400, "size": 1397}
"""
        },
        "fields": {
          "derived_request_object.request": [
            "GET /english/images/venue_bu_city_on.gif HTTP/1.0"
          ],
          "derived_request_object.clientip": [
            "61.177.2.0"
          ]
        },
        "highlight": {
          "derived_request_object.request": [
            "GET /english/<em>images</em>/venue_bu_city_on.gif HTTP/1.0"
          ]
        }
      }
    ]
  }
}

Inferred subfield type

Type inference is based on the same logic as Dynamic mapping. Instead of inferring the subfield type from the first document, a random sample of documents is used to infer the type. If the subfield isn’t found in any documents from the random sample, type inference fails and logs a warning. For subfields that seldom occur in documents, consider defining the explicit field type. Using dynamic type inference for such subfields may result in a query returning no results, like for a missing field.

Explicit subfield type

To define the explicit subfield type, provide the type parameter in the properties object. In the following example, the derived_logs_object.is_active field is defined as boolean. Because this field is only present in one of the documents, its type inference might fail, so it’s important to define the explicit type:

POST /logs_object/_search
{
  "derived": {
    "derived_request_object": {
      "type": "object",
      "script": {
        "source": "emit(params._source[\"request_object\"])"
      },
      "properties": {
        "is_active": "boolean"
      }
    }
  },
  "query": {
    "term": {
      "derived_request_object.is_active": true
    }
  },
  "fields": ["derived_request_object.is_active"]
}

The response contains the matching documents:

Response
{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "logs_object",
        "_id": "5",
        "_score": 1,
        "_source": {
          "request_object": """{"@timestamp": 894440400, "clientip":"132.176.2.0", "request": "GET /french/news/11354.htm HTTP/1.0", "status": 200, "size": 3460, "is_active": true}"""
        },
        "fields": {
          "derived_request_object.is_active": [
            true
          ]
        }
      }
    ]
  }
}