Cloud databases is a $39 billion market growing 50%
The shift of data to the cloud is massive. Gartner estimates that the database management system is $80 billion with the cloud segment contributing ~50% or $39.2 billion. The market grew by $14.5 billion (22.3%) in 2021 and 90% of the growth came from the cloud.
You can see the momentum of cloud players not only from the chart above but also from the shift in market leaders. Microsoft Azure, Amazon Web Services, and Google Cloud Platform (the cloud trinity) overtook Oracle, IBM, and SAP.
To put the opportunity into perspective: Startups often want to be in a $10B market because capturing just one percent means a $100 million revenue - the threshold to becoming a mature stand-alone company. Yet the cloud database segment grew by $13 billion in just one year.
One driver of this growth is software companies integrating with and building on top of cloud databases. Having connections to the cloud trinity data services are table stakes, but what I’m seeing in the startup scene is a quinternity with Snowflake and Databricks filling out the group.
Databricks and Snowflake as data platforms
Excluding the cloud trinity, Snowflake and Databricks are the largest cloud data vendors. Snowflake’s last fiscal year revenue was $1.2 billion. Databricks’ ARR on December 2021 was $800 million, up 80% YoY. Assuming the same growth, Extrapolating 40% growth since then (another half a year of 80% growth), the estimated current ARR is $1.1 billion. For simplicity, let’s say ARR = revenue (it never is), both Databricks & Snowflake command ~3% market share each in the cloud segment. As the cloud segment grows, I’d guess that the top five DBMS players will eventually be AWS, GCP, Microsoft, Databricks, and Snowflake.
In June 2021, Snowflake launched a program called “Powered by Snowflake” with five founding participants selling applications built on top of Snowflake. This excludes technology partners in adjacent data engineering spaces like data catalogs. The program gives companies dedicated support and co-marketing opportunities to build and sell data products using Snowflake. The list has grown >9x in less than a year to 47 today with use cases from education to manufacturing. This understates the scale because there are a lot of companies that directly integrate with Snowflake, like Kubit which provides product analytics for the modern data stack, but are not part of the program.
This makes Snowflake a data platform by definition: 3rd party software developers are increasingly technically and economically dependent on Snowflake. What about Databricks?
Compared to Snowflake, which made the product generally available in 2015, Databricks is newer to the data warehouse scene:
2019: started the Delta Lake project
2020: popularized the Lakehouse concept
2021: announced Databricks SQL general availability, with a vigorous response from Snowflake
While Databricks has catching up to do, I’m looking forward to the applications built on top of Databricks which has historically been treated as a data lake where unstructured data is funneled to. Think images, videos, audio, etc. The breadth of use cases expands exponentially vs. building with structured data alone.
What about OLTP vendors like MongoDB? They’re great but the value of OLAP databases like Databricks and Snowflake as platforms is bringing data from several sources into one accessible interface. Developers need to combine multiple data sources, like correlating click streaming data and customer information coming from Salesforce to highlight product qualified leads. In this context, OLTP is more a data source than a database.
SaaS 3.0: Data applications
Bastardly simplifying the technical SaaS stack, what we’re seeing is the third generation SaaS architecture. SaaS started out siloed, then became integrated via APIs, to now being integrated via data platforms.
This makes sense from a macro-efficiency standpoint. For n SaaS apps in the world, (n-1)!! - double factorial - integrations are needed to connect all apps in the SaaS 2.0 world. And n x m integrations in the SaaS 3.0 world, where m is the number of data platforms.
Software developers can dedicate more engineering efforts toward novel applications that can be built with centralized data. So far what’s bubbling in the startup scene are tooling for new GTM motions (e.g. product-led growth), new billing models (e.g. usage-based billing), and accurate real-time analytics (e.g. full sample product analytics). At the same time, it will be a bloodier competitive space for SaaS companies because data gravity will weaken, lowering switching costs.
But it is great news for the data platforms, they are going to eat a larger piece of the pie.
Till next time, Kenn
Definitely one of the most interesting trends going on in the data world. I've heard Martin Casado & Patrick Chase discuss this as well.
I think the "lowering of the switching costs" point at the end is one piece in a much larger puzzle when it comes to answering how this effects the SaaS space . Probably warrants another article, which I'm happy to contribute to!