Decimal digit token filter
The decimal_digit
token filter is used to normalize decimal digit characters (0–9) into their ASCII equivalents in various scripts. This is useful when you want to ensure that all digits are treated uniformly in text analysis, regardless of the script in which they are written.
Example
The following example request creates a new index named my_index
and configures an analyzer with a decimal_digit
filter:
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_decimal_digit_filter": {
"type": "decimal_digit"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["my_decimal_digit_filter"]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
POST /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "123 ١٢٣ १२३"
}
text
breakdown:
- “123” (ASCII digits)
- “١٢٣” (Arabic-Indic digits)
- “१२३” (Devanagari digits)
The response contains the generated tokens:
{
"tokens": [
{
"token": "123",
"start_offset": 0,
"end_offset": 3,
"type": "<NUM>",
"position": 0
},
{
"token": "123",
"start_offset": 4,
"end_offset": 7,
"type": "<NUM>",
"position": 1
},
{
"token": "123",
"start_offset": 8,
"end_offset": 11,
"type": "<NUM>",
"position": 2
}
]
}