Blog

Bringing Stability and Resilience to the OpenSearch Kubernetes Operator

By January 27, 2026No Comments

In the last couple of months, the OpenSearch Kubernetes Operator has seen significant improvements across security, stability, flexibility, and operational capabilities. We’re excited to announce the 3.0 alpha release of the OpenSearch Kubernetes Operator. This release includes over 100 meaningful changes addressing critical bugs, feature requests, and quality-of-life improvements.

What’s new in 3.0 since v2.8.0?

The OpenSearch Kubernetes Operator 3.0 release focused on resilience and correctness. Earlier versions had significant issues affecting cluster stability during error conditions and upgrades, which we observed across multiple customers, environments, and scales. This release involved substantial rewrites to achieve stability.

The following sections highlight key changes in this release. While not exhaustive, they illustrate the scope of the updates and the significance of this release.

Operational excellence

This release contains the following operational excellence improvements.

Quorum-safe rolling restarts and SmartScaler enabled by default – The operator now performs intelligent and safe rolling restarts across node pools, maintaining cluster quorum throughout. This prevents split-brain scenarios and improves downtime and upgrade reliability in complex deployments, such as multi-Availability-Zone and multi-tier environments.

Multi-namespace support – You can now manage clusters across multiple namespaces, improving support for multi-tenant deployments and organizational separation. This long-requested capability enables more flexible cluster management in shared environments.

Namespace-scoped role-based access control (RBAC) – Helm charts now support namespaces, improving multi-tenant deployments and organizational separation. 

Data integrity testing framework – A new comprehensive testing framework validates data integrity during cluster operations and includes auxiliary system tests to verify recovery and upgrade correctness. All related changes in this release were validated through rigorous automated testing.

PVC-backed bootstrap pods – Bootstrap pods now use PersistentVolumeClaims (PVCs) with configurable disk sizes for more reliable initialization, especially for large clusters or slow networks.

Bootstrap pod customization – Custom annotations and labels on bootstrap pods improve integration across monitoring, networking, and security policies. Plugin installation during bootstrap is now supported.

Streamlining TLS certificates

This release includes the following enhancements that simplify TLS certificate management.

TLS certificate hot reloading – Clusters automatically reload TLS certificates without pod restarts, enabling seamless rotation. Additionally, a new DisableSSL option is available for development and testing environments.

Configurable certificate duration – Align certificate lifetimes with your organization’s  security policies. 

Flexibility and customization

This release adds the following flexibility and customization options.

Init containers and sidecars – Now fully supported for both OpenSearch and Dashboard pods, enabling custom initialization, log shipping, monitoring agents, and service mesh integration.

Network File System (NFS) volume support – Use NFS volumes alongside standard PersistentVolumeClaims.

Custom PVC labels and annotations and topology spread constraints – Better integration with storage operators, backup solutions, and policy engines.

Host aliases – Custom host aliases simplify integration with external services that require custom DNS resolution.

Operator infrastructure improvements

This release includes the following operator infrastructure improvements.

Removed kube-rbac-proxy – Replaced with native controller-runtime authentication and authorization, simplifying deployment architecture and reducing resource overhead.

Prometheus metrics and monitoring – Corrected API version alignment with upstream Prometheus operator and support for custom monitoring labels.

gRPC port – Now exposed and fully configurable, enabling you to use the new gRPC support in OpenSearch 3.0.

Critical bug fixes

This release includes the following critical bug fixes:

  • Resolved deadlocks when upgrading to OpenSearch 3.0.0
  • Fixed version constraint checking for pre-release versions
  • Fixed node version mismatch detection during upgrades
  • Corrected additionalConfig and environment variable application
  • Fixed admin certificate generation and SSL verification mode handling
  • Fixed JVM heap size parameters not being applied
  • Prevented illegal Pod spec updates causing recreation loops

Quality of life improvements

This release includes the following quality of life improvements:

  • Default pod anti-affinity rules prevent colocation
  • Priority class support for operator pods
  • Proper user/group ownership handling for data directories
  • Enhanced dedicated coordinator node pool support
  • Configurable JSON log message key
  • Namespace override through a Helm environment variable 
  • appProtocol field for better service mesh integration

Migration notes

The operator is transitioning from opensearch.opster.io/v1 to opensearch.org/v1 API group, reflecting alignment with OpenSearch project branding.

Timeline

  • OpenSearch 3.0: Both API groups supported
  • Deprecation period: 2–3 major releases (old API group logs warnings)
  • Future release: opensearch.opster.io removed

Automatic migration

The operator includes a migration controller that automatically handles the transition:

  • Auto-sync: Resources created with opensearch.opster.io/v1automatically create corresponding opensearch.org/v1 resources.
  • Status sync: Bidirectional synchronization between old and new resources.
  • Deletion handling: Deleting an old API resource deletes the corresponding new resource.

Manual migration steps

For a clean migration, update your manifests:

# Old
apiVersion: opensearch.opster.io/v1# New
apiVersion: opensearch.org/v1  Update label selectors (for example, change opster.io/opensearch-cluster to opensearch.org/opensearch-cluster).

For Helm deployments, update the configuration as follows:

apiGroup: opensearch.org  # Default (recommended)

For detailed steps and resource mappings, see the Migration Guide.

Breaking Changes

Review these breaking changes before upgrading:

  • SmartScaler – Now enabled by default. Disable explicitly if not needed.
  • Kube-rbac-proxy – Removed; update any custom monitoring configurations accordingly.
  • SetVMMaxMapCount – Now defaults to true.
  • Validation webhooks – Active and reject invalid configurations. Test in non-production environments first.
  • Security TLS – Enabled by default for both transport and REST APIs.
  • Default password – No fixed default is set. Specify a password explicitly or accept the auto-generated password.

Looking ahead

This release marks a major step forward for the OpenSearch Kubernetes Operator, delivering improved operational safety, expanded customization, and enhanced production readiness. If you’re using the OpenSearch Kubernetes Operator, we encourage you to upgrade to the newest version, because earlier operator releases exhibited significant issues in production.

Although this is an alpha release, we consider it stable enough for existing users to try. We recommend starting in lower environments and progressing to higher environments as confidence grows.

In the weeks following this release, assuming that no major issues arise (or all discovered issues are resolved and tested), we will release a beta version of the operator. Two weeks after no release blockers are found in the beta release, we plan to ship the generally available OpenSearch Kubernetes Operator 3.0.

Next steps

We encourage you to try the new operator. A significant amount of effort went into building and testing it, and we look forward to your feedback. File issues, open PRs, and join the conversation on GitHub, the OpenSearch public Slack #k8s-operator channel, and the OpenSearch Community forum

Acknowledgements

We want to acknowledge and thank the team at BigData Boutique, including Jose Barato, Ryan Patterson and Lior Friedler, as well as the veteran maintainer Prudhvi Godithi, for the hard work and help achieving this important milestone. We also extend our gratitude to the OpenSearch community contributors tested, provided feedback, and helped refine the new operator along the way.

Author