Link Search Menu Expand Document Documentation Menu

obfuscate

The obfuscate process enables obfuscation of fields inside your documents in order to protect sensitive data.

Usage

In this example, a document contains a log field and a phone field, as shown in the following object:

{
  "id": 1,
  "phone": "(555) 555 5555",
  "log": "My name is Bob and my email address is abc@example.com"
}

To obfuscate the log and phone fields, add the obfuscate processor and call each field in the source option. To account for both the log and phone fields, the following example uses multiple obfuscate processors because each processor can only obfuscate one source.

In the first obfuscate processor in the pipeline, the source log uses several configuration options to mask the data in the log field, as shown in the following example. For more details on these options, see configuration.

pipeline:
  source:
    http:
  processor:
    - obfuscate:
        source: "log"
        target: "new_log"
        patterns:
          - "[A-Za-z0-9+_.-]+@([\\w-]+\\.)+[\\w-]{2,4}"
        action:
          mask:
            mask_character: "#"
            mask_character_length: 6
    - obfuscate:
        source: "phone"
  sink:
    - stdout:

When run, the obfuscate processor parses the fields into the following output:

{
  "id": 1,
  "phone": "***",
  "log": "My name is Bob and my email address is abc@example.com",
  "newLog": "My name is Bob and my email address is ######"
}

Configuration

Use the following configuration options with the obfuscate processor.

Parameter Required Description
source Yes The source field to obfuscate.
target No The new field in which to store the obfuscated value. This leaves the original source field unchanged. When no target is provided, the source field updates with the obfuscated value.
patterns No A list of regex patterns that allow you to obfuscate specific parts of a field. Only parts that match the regex pattern will obfuscate. When not provided, the processor obfuscates the whole field.
single_word_only No When set to true, a word boundary \b is added to the pattern, which causes obfuscation to be applied only to words that are standalone in the input text. By default, it is false, meaning obfuscation patterns are applied to all occurrences. Can be used for Data Prepper 2.8 or greater.
obfuscate_when No Specifies under what condition the Obfuscate processor should perform matching. Default is no condition.
tags_on_match_failure No The tag to add to an event if the obfuscate processor fails to match the pattern.
action No The obfuscation action. As of Data Prepper 2.3, only the mask action is supported.

You can customize the mask action with the following optional configuration options.

Parameter Default Description
mask_character * The character to use when masking. Valid characters are !, #, $, %, &, *, and @.
mask_character_length 3 The number of characters to mask in the field. The value must be between 1 and 10.

Predefined patterns

When using the patterns configuration option, you can use a set of predefined obfuscation patterns for common fields. The obfuscate processor supports the following predefined patterns.

You cannot use multiple patterns for one obfuscate processor. Use one pattern for each obfuscate processor.

Pattern name Examples
%{EMAIL_ADDRESS} abc@test.com
123@test.com
abc123@test.com
abc_123@test.com
a-b@test.com
a.b@test.com
abc@test-test.com
abc@test.com.cn
abc@test.mail.com.org
%{IP_ADDRESS_V4} 1.1.1.1
192.168.1.1
255.255.255.0
%{BASE_NUMBER} 1.1
.1
2000
%{CREDIT_CARD_NUMBER} 5555555555554444
4111111111111111
1234567890123456
1234 5678 9012 3456
1234-5678-9012-3456
%{US_PHONE_NUMBER} 1555 555 5555
5555555555
1-555-555-5555
1-(555)-555-5555
1(555) 555 5555
(555) 555 5555
+1-555-555-5555
%{US_SSN_NUMBER} 123-11-1234
350 characters left

Have a question? .

Want to contribute? or .