Getting started with vector search

This guide shows you how to use your own vectors in OpenSearch. You’ll learn to create a vector index, add location data, and run a vector search to find the nearest hotels on a coordinate plane. While this example uses two-dimensional vectors for simplicity, the same approach applies to higher-dimensional vectors used in semantic search and recommendation systems.

Prerequisite: Install OpenSearch

If you don't have OpenSearch installed, follow these steps to create a cluster.

Before you start, ensure that Docker is installed and running in your environment.
This demo configuration is insecure and should not be used in production environments.

Download and run OpenSearch:

docker pull opensearchproject/opensearch:latest && docker run -it -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:latest

OpenSearch is now running on port 9200. To verify that OpenSearch is running, send the following request:

curl https://localhost:9200

You should get a response that looks like this:

{
  "name" : "a937e018cee5",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "GLAjAG6bTeWErFUy_d-CLw",
  "version" : {
    "distribution" : "opensearch",
    "number" : <version>,
    "build_type" : <build-type>,
    "build_hash" : <build-hash>,
    "build_date" : <build-date>,
    "build_snapshot" : false,
    "lucene_version" : <lucene-version>,
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

For more information, see Installation quickstart and Install and upgrade OpenSearch.

Step 1: Create a vector index

First, create an index that will store sample hotel data. To signal to OpenSearch that this is a vector index, set index.knn to true. You’ll store the vectors in a vector field named location. The vectors you’ll ingest will be two-dimensional, and the distance between vectors will be calculated using the Euclidean l2 similarity metric:

PUT /hotels-index
{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "location": {
        "type": "knn_vector",
        "dimension": 2,
        "space_type": "l2"
      }
    }
  }
}

Step 2: Add data to your index

Next, add data to your index. Each document represents a hotel. The location field in each document contains a two-dimensional vector specifying the hotel’s location:

POST /_bulk
{ "index": { "_index": "hotels-index", "_id": "1" } }
{ "location": [5.2, 4.4] }
{ "index": { "_index": "hotels-index", "_id": "2" } }
{ "location": [5.2, 3.9] }
{ "index": { "_index": "hotels-index", "_id": "3" } }
{ "location": [4.9, 3.4] }
{ "index": { "_index": "hotels-index", "_id": "4" } }
{ "location": [4.2, 4.6] }
{ "index": { "_index": "hotels-index", "_id": "5" } }
{ "location": [3.3, 4.5] }

Step 3: Search your data

Now search for hotels closest to the pin location [5, 4]. To search for the top three closest hotels, set k to 3:

POST /hotels-index/_search
{
  "size": 3,
  "query": {
    "knn": {
      "location": {
        "vector": [5, 4],
        "k": 3
      }
    }
  }
}

The following image shows the hotels on the coordinate plane. The query point is labeled Pin, and each hotel is labeled with its document number.

The response contains the hotels closest to the specified pin location:

{
  "took": 1093,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 0.952381,
    "hits": [
      {
        "_index": "hotels-index",
        "_id": "2",
        "_score": 0.952381,
        "_source": {
          "location": [
            5.2,
            3.9
          ]
        }
      },
      {
        "_index": "hotels-index",
        "_id": "1",
        "_score": 0.8333333,
        "_source": {
          "location": [
            5.2,
            4.4
          ]
        }
      },
      {
        "_index": "hotels-index",
        "_id": "3",
        "_score": 0.72992706,
        "_source": {
          "location": [
            4.9,
            3.4
          ]
        }
      }
    ]
  }
}

Generating vector embeddings automatically

If your data isn’t already in vector format, you can generate vector embeddings directly within OpenSearch. This allows you to transform text or images into their numerical representations for similarity search. For more information, see Generating vector embeddings automatically.

Next steps

Prerequisite: Install OpenSearch
Step 1: Create a vector index
Step 2: Add data to your index
Step 3: Search your data
Generating vector embeddings automatically
Next steps

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Getting started with vector search

Prerequisite: Install OpenSearch

Step 1: Create a vector index

Step 2: Add data to your index

Step 3: Search your data

Generating vector embeddings automatically

Next steps

OpenSearch Links

Get Involved

Resources

Contact Us