Managed Ingestion to Open Lakehouse for the AI Era

RAG, AI Infrastructure, Streaming, Open Data Lake

As AI workloads scale, data ingestion has become the silent bottleneck. Models demand high-quality, current, and compliant data—yet most pipelines remain brittle and opaque. This talk dives into how managed ingestion frameworks solve that challenge by integrating streaming and batch data into Iceberg tables with full governance.

We’ll explore ingestion automation patterns—schema inference, type evolution, deduplication, compaction, and CDC handling—alongside built-in observability and lineage capture. Using open technologies such as Apache Iceberg, Trino, and Starburst’s ingestion orchestration, you’ll see how to ingest once and serve many downstream AI use cases, from analytics to RAG.

Attendees will leave with a clear blueprint for turning ingestion from a maintenance burden into a competitive advantage for AI-driven organizations.

Key Takeaways:

• How governance, lineage, and compaction make ingestion AI-ready.

• Real-world architecture patterns for scalable managed ingestion.

Jitender Aswani is SVP of Engineering at Starburst, leading the company’s AI and data platform strategy. His teams build Starburst’s open, federated data infrastructure, powered by Trino and Apache Iceberg, to help enterprises unify analytics and AI across clouds, SaaS, and on-prem systems.