Codec processor combinations
At ingestion time, data received by the
s3 source can be parsed by codecs. Codecs compresses and decompresses large data sets in a certain format before ingestion them through a Data Prepper pipeline processor.
While most codecs can be used with most processors, the following codec processor combinations can make your pipeline more efficient when used with the following input types.
A JSON array is used to order elements of different types. Because an array is required in JSON, the data contained within the array must be tabular.
The JSON array does not require a processor.
Unlike a JSON array, NDJSON allows for each row of data to be delimited by a newline, meaning data is processed per line instead of an array.
The CSV data type inputs data as a table. It can used without a codec or processor, but it does require one or the other, for example, either just the
csv processor or the
The CSV input type is most effective when used with the following codec processor combinations.
csv codec is used without a processor, it automatically detects headers from the CSV and uses them for index mapping.
newline codec parses each row as a single log event. The codec will only detect a header when
header_destination is configured. The csv processor then outputs the event into columns. The header detected in
header_destination from the
newline codec can be used in the
csv processor under
[Apache Avro] helps streamline streaming data pipelines. It is most efficient when used with the
avro codec inside an