You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Asynchronous batch ingestion

Introduced 2.17

Use the Asynchronous Batch Ingestion API to ingest data into your OpenSearch cluster from your files on remote file servers, such as Amazon Simple Storage Service (Amazon S3) or OpenAI. For detailed configuration steps, see Asynchronous batch ingestion.

Path and HTTP methods

POST /_plugins/_ml/_batch_ingestion

Request body fields

The following table lists the available request fields.

Field	Data type	Required/Optional	Description
`index_name`	String	Required	The index name.
`field_map`	Object	Required	Maps fields from the source file to specific fields in an OpenSearch index for ingestion.
`ingest_fields`	Array	Optional	Lists fields from the source file that should be ingested directly into the OpenSearch index without any additional mapping.
`credential`	Object	Required	Contains the authentication information for accessing external data sources, such as Amazon S3 or OpenAI.
`data_source`	Object	Required	Specifies the type and location of the external file(s) from which the data is ingested.
`data_source.type`	String	Required	Specifies the type of the external data source. Valid values are `s3` and `openAI`.
`data_source.source`	Array	Required	Specifies one or more file locations from which the data is ingested. For `s3`, specify the file path to the Amazon S3 bucket (for example, `["s3://offlinebatch/output/sagemaker_batch.json.out"]`). For `openAI`, specify the file IDs for input or output files (for example, `["file-<your output file id>", "file-<your input file id>", "file-<your other file>"]`).

Example request: Ingesting a single file

POST /_plugins/_ml/_batch_ingestion
{
  "index_name": "my-nlp-index",
  "field_map": {
    "chapter": "$.content[0]",
    "title": "$.content[1]",
    "chapter_embedding": "$.SageMakerOutput[0]",
    "title_embedding": "$.SageMakerOutput[1]",
    "_id": "$.id"
  },
  "ingest_fields": ["$.id"],
  "credential": {
    "region": "us-east-1",
    "access_key": "<your access key>",
    "secret_key": "<your secret key>",
    "session_token": "<your session token>"
  },
  "data_source": {
    "type": "s3",
    "source": ["s3://offlinebatch/output/sagemaker_batch.json.out"]
  }
}

Example request: Ingesting multiple files

POST /_plugins/_ml/_batch_ingestion
{
  "index_name": "my-nlp-index-openai",
  "field_map": {
    "question": "source[1].$.body.input[0]",
    "answer": "source[1].$.body.input[1]",
    "question_embedding":"source[0].$.response.body.data[0].embedding",
    "answer_embedding":"source[0].$.response.body.data[1].embedding",
    "_id": ["source[0].$.custom_id", "source[1].$.custom_id"]
  },
  "ingest_fields": ["source[2].$.custom_field1", "source[2].$.custom_field2"],
  "credential": {
    "openAI_key": "<you openAI key>"
  },
  "data_source": {
    "type": "openAI",
    "source": ["file-<your output file id>", "file-<your input file id>", "file-<your other file>"]
  }
}

Example response

{
  "task_id": "cbsPlpEBMHcagzGbOQOx",
  "task_type": "BATCH_INGEST",
  "status": "CREATED"
}

Path and HTTP methods
Example request: Ingesting a single file
Example request: Ingesting multiple files
Example response

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Asynchronous batch ingestion

Path and HTTP methods

Request body fields

Example request: Ingesting a single file

Example request: Ingesting multiple files

Example response

OpenSearch Links

Get Involved

Resources

Contact Us