top of page

Databricks By The Bay

Updated: Sep 12

Some friendships in tech conferences run deep, and ours with Databricks is one of them. Over the years, their engineers have stood on our stages multiple times.



This year, the tradition continues. We’re thrilled to congratulate Lisa N Cao, a valued member of our PC committee, on her new role as Staff Developer Relations.

Lisa will also be speaking on challenges of data platforming for AI, the critical role of metadata and data catalogs, and why vendor-neutral solutions are key to building governed, enterprise-ready AI systems.


ree

In 2023, Jerry Peng delivered a standout talk for anyone working in real-time analytics: “Latency Goes Sub-Second in Apache Spark Structured Streaming.” Jerry showed how Databricks had improved offset management, cutting latency by up to 75%—from 700–900 milliseconds down to just 150–250 milliseconds—even at throughputs of one million events per second.


2021 was a banner year Databricks brought an impressive lineup of speakers to our ninth Scale By The Bay conference. John O’Dwyer, Developer Advocate talked about distributed systems, Spark, Delta Lake, and MLflow.



Itai Yaffe and Yakir Buskilla also were in a spotlight, expanding Databricks’ technical footprint in our program.


In 2020, we welcomed even more firepower. Matei Zaharia, Databricks co-founder, CTO, and creator of Apache Spark, shared his vision for the future.




Li Haoyi returned with his trademark Scala expertise.



By 2019, Databricks had made a significant mark on our agenda. Li Haoyi and Ahir Reddy delivered “Speedy Scala Builds at Databricks,” a talk that turned the common pain of slow Scala compilation into a success story of JVM tuning and cloud infrastructure.



Our shared journey began back in 2018, when Karthik Ramasamy joined us as a speaker.



Databricks didn’t just participate—they invested in the community, stepping up as our Gold Partners in both 2019 and 2021. That spirit of collaboration has only grown stronger.


From Spark to a Data Intelligence


Founded in 2013 Databricks grew out of the team that created Apache Spark. Over the years, they’ve also been the driving force behind Delta Lake, MLflow, Unity Catalog, and the lakehouse architecture. From the start, their mission was simple but ambitious: unify big data and AI in a way that makes it accessible, governed, and genuinely useful for organizations of all sizes.


Today, that vision takes the form of the Data Intelligence Platform. The platform enables teams to search, query, and act on data using natural language, while keeping everything under enterprise-grade governance through Unity Catalog. AI agents can be embedded directly into workflows, turning raw data into decisions without losing transparency or control.


They now serve over 10,000 organizations worldwide, from startups to Fortune 500 giants.

On the AI front, they’ve made bold moves. In March 2024, they released DBRX, a state-of-the-art open-source large language model that outperformed LLaMA 2 and Mixtral. Strategic acquisitions—OKERA for governance, MosaicML for generative AI training, Tabular for data replication, and Neon for serverless infrastructure—have all strengthened their platform.


Databricks’ direction has only accelerated in recent years. In mid-2025, they launched Databricks One, a streamlined interface that opens the platform to business users. With features like the Genie natural language interface, custom dashboards, and domain-specific apps—all tied into the same governed data—Databricks One represents a shift from data preparation to decision-making.


This evolution reflects Databricks’ focus on making enterprise-grade AI and data capabilities accessible to everyone.




Comments


bottom of page