Building powerful RAG pipelines with Docling and OpenSearch

Retrieval-augmented generation (RAG) has become a critical approach for building trustworthy, domain-specific artificial intelligence (AI) systems. By combining retrieval systems with large language models (LLMs), RAG allows applications to ground their outputs in external knowledge sources. However, building reliable RAG systems remains challenging, especially when working with complex enterprise documents and large-scale retrieval. Two key bottlenecks frequently arise: accurate document ingestion and high-quality retrieval. This is where Docling and OpenSearch together provide a powerful solution. Docling ensures precise document parsing and structuring, while OpenSearch enables scalable, metadata-aware search and retrieval. The result is a RAG foundation that can accurately represent and efficiently retrieve knowledge across diverse document types.

What is Docling?

Docling is an open-source document processing toolkit that transforms complex documents into structured, machine-readable data for AI applications, including generative AI systems. It can parse a wide range of document formats, such as PDF, DOCX, and PPTX, while preserving essential structural information like layout, tables, and reading order. The parsed content can then be exported as Markdown, JSON, or HTML, making it easy to incorporate document data into modern AI workflows. Originally developed at IBM Research, Docling was donated as an incubation-stage project to the LF AI & Data Foundation in April 2025. It has since seen rapid community adoption, with more than 42,000 GitHub stars, usage across 2,400 GitHub organizations, and 1.5 million monthly downloads from PyPI. Docling integrates seamlessly with the broader generative AI ecosystem, providing flexible serializers, metadata enrichment, and hierarchical chunking mechanisms‚ all key enablers for high-quality RAG workflows.

Why combine Docling and OpenSearch for RAG?

Together, Docling and OpenSearch address both sides of the RAG challenge:

Docling ensures that the input documents are transformed into structured, semantically meaningful chunks with rich metadata.
OpenSearch provides a scalable, high-performance search engine capable of storing embeddings, running vector similarity searches, and filtering or aggregating results using metadata.

This combination helps developers build AI applications that are accurate, explainable, and robust when dealing with real-world data.

Leveraging Docling and OpenSearch for advanced RAG

The integration between Docling and OpenSearch unlocks several key benefits for developers building RAG applications.

Faithful document conversion with Docling

Docling can parse and convert a variety of document formats, including PDF, DOCX, and HTML, into a structured representation in JSON (the DoclingDocument). This representation retains hierarchical relationships, such as sections and subsections, and preserves complex data like tables and figures. Docling also supports multimodal inputs: it can transcribe audio files and run vision models on images to produce descriptive captions. By fusing these capabilities, developers can build a RAG pipeline that draws from multiple formats in a single, coherent representation. Example: parsing a PDF into structured data using Docling’s Python API

from docling.document_converter import DocumentConverter

# Document path or URL
source = "https://arxiv.org/pdf/2408.09869"

# Convert to structured format (DoclingDocument)
converter = DocumentConverter()
doc = converter.convert(source).document

# Inspect the parsed structure
print(len(doc.tables))
#> 3

# Export to markdown format
print(doc.export_to_markdown())
#> "## Docling Technical Report[...]"

Chunking and custom serialization

Docling provides flexible chunking mechanisms that allow developers to segment documents into meaningful, structured units for retrieval and generative AI tasks. The HierarchicalChunker splits content into semantically coherent segments‚ such as sections, paragraphs, tables, and figures‚ while preserving the logical document hierarchy in metadata. This structure-aware approach improves both the precision and interpretability of retrieval results. Building on this foundation, Docling introduces the HybridChunker, which applies tokenization-aware refinements on top of hierarchical chunking. The hybrid approach ensures that the resulting chunks are optimally sized for embedding models, maintaining semantic integrity while respecting model token limits. In addition, Docling supports custom serializers, such as Markdown serializers for tabular data. These serializers make it easier for generative models to understand the structure and context of the information. By combining hybrid chunking with structured serialization and OpenSearch’s vector indexing, developers can build RAG pipelines that deliver high-fidelity document understanding, scalable storage, and accurate retrieval.

Context-aware retrieval using OpenSearch

OpenSearch supports vector search with metadata filtering, allowing retrievals that consider both semantic similarity and contextual fields provided by Docling, such as section type, table presence, or document source. This enables domain-specific retrieval strategies‚ such as focusing on quantitative data or specific document sections‚ resulting in more relevant and accurate generative outputs.

Context expansion for better answers

Docling retains hierarchical relationships in chunk metadata, enabling context expansion during retrieval. For instance, when a subsection is retrieved, related chunks from the parent section can be included automatically to provide a more coherent context. This expansion ensures that the model receives coherent, contextually complete inputs‚ reducing hallucinations and improving factual accuracy.

Integrating Docling and OpenSearch in RAG workflows

The LlamaIndex framework simplifies RAG orchestration by connecting document parsers, vector stores, and LLMs. Docling integrates naturally into this workflow as the ingestion and structuring component, while OpenSearch serves as the vector and metadata store. The following diagram shows an example of the integration workflow.

Load the files

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.docling import DoclingReader

my_docs = "/path/to/my/documents"
reader = DoclingReader(export_type=DoclingReader.ExportType.JSON)
dir_reader = SimpleDirectoryReader(
    input_dir=my_docs,
    file_extractor={".pdf": reader},
)
documents = dir_reader.load_data()

Define the transformations

Before ingesting data, define the transformations to apply to the DoclingDocument:

DoclingNodeParser executes the document-based chunking.
MetadataTransform ensures that generated chunk metadata is correctly formatted for indexing in OpenSearch.

from docling.chunking import HybridChunker
from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser(chunker=HybridChunker())

class MetadataTransform(TransformComponent):
    def __call__(self, nodes, **kwargs):
        for node in nodes:
            binary_hash = node.metadata.get("origin", {}).get("binary_hash", None)
            if binary_hash is not None:
                node.metadata["origin"]["binary_hash"] = str(binary_hash)
        return nodes

Calculate, insert, and index the embeddings

Create an OpenSearchVectorClient, which encapsulates the logic for a single OpenSearch index with vector search enabled. Then initialize the index using the converted files, the Docling node parser, and the OpenSearch client that you just created. The DoclingDocument objects will be chunked, and the calculated embeddings will be stored and indexed in OpenSearch:

from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.opensearch import (
    OpensearchVectorClient,
    OpensearchVectorStore,
)

opensearch_endpoint = "http://localhost:9200"  # Set the OpenSearch endpoint
text_field = "content"
embed_field = "embedding"
embed_model = OllamaEmbedding(model_name="granite-embedding:30m")  # Set a LlamaIndex embedding object
embed_dim = len(embed_model.get_text_embedding("hi"))

client = OpensearchVectorClient(
    endpoint="http://localhost:9200",
    index=opensearch_endpoint,
    dim=embed_dim,
    embedding_field=embed_field,
    text_field=text_field,
)

vector_store = OpensearchVectorStore(client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents=documents,
    transformations=[node_parser, MetadataTransform()],
    storage_context=storage_context,
    embed_model=embed_model,
)

Assemble and run the RAG system

With LlamaIndex’s query engine, you can simply run a RAG system as follows:

from llama_index.llms.ollama import Ollama
from rich.console import Console

gen_model = Ollama(model="granite4:micro")  # Set a LlamaIndex LLM object
console = Console(width=88)
query = "Which are the main AI models in Docling?"
query_engine = index.as_query_engine(llm=gen_model)
res = query_engine.query(query)

console.print(f"üë§: {query}\nü§ñ: {res.response.strip()}")
#  👤: Which are the main AI models in Docling?
#  🤖: Docling primarily utilizes two AI models. The first one is a layout analysis model, 
#  serving as an accurate object-detector for page elements. The second model is 
#  TableFormer, a state-of-the-art table structure recognition model. Both models are 
#  pre-trained and their weights are hosted on Hugging Face. They also power the 
#  deepsearch-experience, a cloud-native service for knowledge exploration tasks.

This example shows how easily developers can combine the best of Docling’s document understanding with OpenSearch’s search capabilities to build robust RAG applications.

Learn more

To explore these integrations and capabilities in more detail, see the following resources:

By combining Docling’s advanced document understanding with OpenSearch’s scalable retrieval, you can build RAG systems that deliver grounded, high-quality answers to complex questions.

Authors

Cesar Berrospi Ramis

Cesar Berrospi Ramis is a Senior Research Scientist at IBM Research, Zurich. With a background in mathematics and engineering, he specializes in AI applications in document understanding and is part of Docling's development team.

View all posts
Michele Dolfi

Dr. Michele Dolfi is a technical lead in the AI for Knowledge group at IBM Research, focusing on knowledge engineering and understanding. Michele is one of the researchers who created the Docling open-source project and is now part of its steering committee.

View all posts
Lauren McHugh

Lauren is a Program Director for AI Open Innovation, focused on lab-to-market adoption of key technologies from IBM Research including the Docling toolkit and Granite models.

View all posts
Peter W. J. Staar

Peter W. J. Staar manages the AI for Knowledge group at IBM Research, Zurich. He also chairs the technical steering committee of Docling, the leading open-source AI framework for document processing.

View all posts

Building powerful RAG pipelines with Docling and OpenSearch

What is Docling?

Why combine Docling and OpenSearch for RAG?

Leveraging Docling and OpenSearch for advanced RAG

Faithful document conversion with Docling

Chunking and custom serialization

Context-aware retrieval using OpenSearch

Context expansion for better answers

Integrating Docling and OpenSearch in RAG workflows

Load the files

Define the transformations

Calculate, insert, and index the embeddings

Assemble and run the RAG system

Learn more

Authors

OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

Participate

Providers

Resources

Building powerful RAG pipelines with Docling and OpenSearch

What is Docling?

Why combine Docling and OpenSearch for RAG?

Leveraging Docling and OpenSearch for advanced RAG

Faithful document conversion with Docling

Chunking and custom serialization

Context-aware retrieval using OpenSearch

Context expansion for better answers

Integrating Docling and OpenSearch in RAG workflows

Load the files

Define the transformations

Calculate, insert, and index the embeddings

Assemble and run the RAG system

Learn more

Share or Summarize with AI

Authors

Participate

Providers

Resources