Executive Takeaway
To support its vast cloud ecosystem, SAP is advancing its observability strategy with a strong focus on open standards. A key pillar of this strategy is SAP Cloud Logging, a managed service built on OpenSearch that operates over 11,000 instances. In a significant modernization effort, the service adopted OpenTelemetry for telemetry collection and re-architected its ingestion pipelines using Data Prepper. This initiative not only provided a low-risk path to OpenSearch 2.x but also empowered its users with a unified view of logs, metrics, and traces, leading to faster incident response. This project demonstrates a scalable and open model for building a future-ready observability foundation, improving reliability across thousands of cloud workloads.
The Situation
SAP Cloud Logging service operates one of the largest OpenSearch deployments in the world with more than 11,000 Kubernetes-hosted instances supporting critical observability services. The company needed to:
- Modernize its logging and monitoring platform.
- Unify data across logs, metrics, and traces.
- Migrate its entire fleet from OpenSearch 1.x to 2.x with near-zero downtime.
The Challenge
SAP Cloud Logging, the Kubernetes-based service that delivers OpenSearch-powered observability, faced two pressing needs:
- Zero-downtime migration from OpenSearch 1.x to 2.x across thousands of production instances.
- Unified telemetry access to allow developers to filter and correlate logs, metrics, and traces in one place, enabling faster troubleshooting and prevention of issues before they impacted customers.
Previously, telemetry data was scattered across multiple systems and formats, each surfacing different kinds of insights. This fragmentation slowed investigations and made cross-signal correlation difficult. With the OpenTelemetry approach, SAP can now link and correlate data between these systems, enabling a unified view without losing the unique value each source provides. At SAP’s scale, even minor inefficiencies created significant operational overhead.
The Solution
Aligning with SAP’s strategic vision to adopt OpenTelemetry as the target standard for observability, the SAP Cloud Logging service has become a key driver of this initiative. The service now offers native OpenTelemetry Protocol (OTLP) ingestion, with this feature enabled on over 1,000 of its managed service instances. These instances, monitoring environments like Cloud Foundry and Kubernetes, collect telemetry from a multitude of underlying workloads, providing a consistent and vendor-neutral data format for its users.
Implementation Highlights:
- Pipelines per signal type: Configured in the OpenTelemetry Collector to handle logs, metrics, and traces separately, batching events for efficient throughput.
- Data Prepper ingestion: Routing telemetry into OpenSearch with:
- Back-pressure handling to protect pipelines during indexing slowdowns.
- Peer-forwarding to manage trace data across multiple Data Prepper instances.
- Consistent schema design: Index templates built on OpenTelemetry semantic conventions ensured reliable field mapping and eliminated conflicts (e.g., from Kubernetes labels).
- Cross-signal visualization: OpenSearch Dashboards enabled filtering by namespace, pod, or service across all telemetry types for a single, correlated view of workload health.
The Results
- Low-risk migration path for 11,000+ instances to OpenSearch 2.x, paving the way for future upgrades to 3.x.
- Unified observability at scale: Service users and development teams can filter and correlate telemetry across thousands of workloads in seconds.
- Improved pipeline resilience with Data Prepper’s back-pressure and buffering capabilities.
- OpenSearch Catalog compatibility: Schema supports reusing and contributing community-built dashboards and visualizations.
“With OpenTelemetry and OpenSearch, we can finally look at all our observability data in one place, filter it instantly, and understand exactly what’s happening across thousands of workloads,” said Karsten Schnitter, Software Architect at SAP. “That changes how quickly we can act.”
Why It Matters
This initiative at SAP shows how large-scale operators can combine open standards for telemetry collection with an open source search and analytics platform to achieve unified observability without sacrificing control.
For engineering and operations leaders, this model demonstrates how to:
- Reduce complexity by standardizing on open formats.
- Improve mean time to resolution (MTTR) through cross-signal correlation.
- Future-proof observability infrastructure with community-driven innovation.
Learn More
Watch the full OpenSearchCon talk.