OpenSearchCon 2024 North America Session: Lucene And Beyond - Core Storage Extension In OpenSearch · OpenSearch

OpenSearch Core

OpenSearch Dashboards

OpenSearch Data Prepper

Performance Benchmarks

Search

Observability

Security Analytics

Machine Learning and AI

Forum

Providers

Events

Projects

Members

Documentation Library

OpenSearch and Dashboards

OpenSearch Benchmark

OpenSearch Data Prepper

Clients

Visit the OpenSearch Blog

Platform

OpenSearch

OpenSearch is a powerful search and analytics engine built on Apache Lucene.

OpenSearch Dashboards

Our data visualization toolset is a flexible, fully integrated solution for visually exploring and querying your data.

OpenSearch Data Prepper

A server-side data collector designed to enrich, transform, and aggregate data for downstream analytics with OpenSearch.

Capabilities

Machine Learning and AI

Vector Database

Anomaly Detection

Search

E-Commerce

Document Search

Observability

Performance Monitoring

Log Analysis

Security Analytics

Threat Intelligence

Event Correlation

Performance Benchmarks

View key performance metrics across different workloads

View Data

Platform Drawer Icon

OpenSearch Blog

Most Recent Articles
Generative AI: OpenSearch's journey as an open-source search engine	Mar 26
OpenSearch as a SIEM Solution	Mar 20
GPU-accelerated vector search in OpenSearch: A new frontier	Mar 18
Solution Provider Highlight - Enhancing anomaly detection in Amazon OpenSearc...	Mar 07
Tracking the evolution of OpenSearch performance	Mar 06
Efficient large-scale filtering with bitmap filtering in OpenSearch	Feb 25
Reduce costs with disk-based vector search	Feb 19
From chaos to clarity: Revolutionizing OpenSearch clients and documentation u...	Feb 13
Introducing reciprocal rank fusion for hybrid search	Feb 12
Explore OpenSearch 2.19	Feb 11

Featured Post

Blog Drawer Icon

OpenSearch Community

Forum

Find answers to your questions, help others in the community, and join the conversation.

Solution Providers

Find open-source providers offering solutions and services.

Events

Community Meetings, Development Backlog & Triage, in-person, and virtual events.

Projects

Highlights of projects built by the community.

Members

Community member profiles.

User Groups

Join the OpenSearch Project Meetup Network

Community Resources

Slack

Speak with other developers in the OpenSearch community in our public Slack.

Github Project Organization

Join us for in-person and virtual events to learn the latest about the project.

OpenSearchCon

Europe: Apr 2025

OpenSearchCons in 2024

North America (San Francisco): September 24-26

India (Bengaluru): June 26

Europe (Berlin): May 6-7

Community Drawer Icon

Documentation Library

OpenSearch and Dashboards

Build your OpenSearch solution using core tooling and visualizations.

OpenSearch Benchmark

Measure performance metrics for your OpenSearch cluster.

OpenSearch Data Prepper

Filter, mutate, and sample your data for ingestion into OpenSearch.

Clients

Interact with OpenSearch from your application using language APIs.

OpenSearch Project Roadmap

OpenSearch Project Roadmap

Read our blog on 2024-2025 project development plan and beyond!

Documentation Drawer Icon

Search Drawer Icon

OpenSearch is tightly bound to the Lucene core APIs that facilitate the following functionalities:

Encoding
Transactions
Merges
Search
And more…

In this presentation I will discuss how the OpenSearch storage encoding can be extended to popular formats (e.g. Parquet, Avro) that are readable by public big data systems such as Apache spark. This provides a strategic long term benefit for the project as it allows it to more easily integrate with big data systems without the need for reindexing and transforming the data. In addition for integrations it allows for OpenSearch to easily enjoy new encoding developments that are happening outside of Lucene, such as new compression algos etc.. Moreover, I will discuss the various way of solving this problem and how in my case I choose to extend it via a new extension mechanism that involves an external writer. The approach is quite generic and allow to extend many other aspects of the Lucene codec with native implementations such as Rust/Python etc..

Details

Wednesday, September 25 2:35pm-3:15pm in MainStage

Track: Operating OpenSearch

Speakers

Samuel Herman photograph

Samuel Herman

Architect at Oracle

View All Sessions

View All Speakers