In the 1960s, NASA was gearing up for its biggest mission yet: put a man on the moon.
When the organization needed a way to track the millions of parts needed to build a rocket for the job, IBM built them the world’s first-ever commercial database system in 1968.
Still today, companies largely rely on IBM and fellow industry giants such as Oracle, which have dominated commercial data storage for decades. But databases have evolved substantially over the past 50 years.
The 2000s marked a turn toward specialty database products. Newcomers included Vertica, which sped up query processing by storing data in columns opposed to rows, and Apache Cassandra, a tool built to handle large amounts of data across several servers for distributed storage. Cloud infrastructure hit the market and became standard in the 2010s. Popular cloud offerings such as AWS Redshift, Google BigQuery and eventually Snowflake gave companies a brand new way forward: better scalability, fewer maintenance requirements (and far less hassle in managing compute vs. storage), and increased security.
Now, the sector is shifting further to keep pace with growing and more complicated data needs. At this point, 94% of the world’s companies use cloud storage services for everything from logging customer transactions to running their business remotely. The 2020s are dubbed the cloud database renaissance (...again), with the market expected to more than triple in value over the next decade, from $16 billion today to $59 billion in 2032. We’re seeing industry leaders like Oracle, IBM, Google, AWS, and Snowflake evolve their cloud products at the same time new startups are emerging to offer brand new solutions.
So what's pushing this renaissance forward?
Increasing Heterogeneity in Data Means Purpose Built Databases and a Lower Technical Barrier to Usage
On a macro scale, we’ve started to witness company leaders develop more willingness to experiment with how they store their ever-accumulating data. For context, today there are 97 zettabytes of data worldwide, and that total is expected to nearly double by 2025. As data volume doubles and usage expands in parallel, companies must constantly optimize for methods of ensuring appropriate data storage, efficient access, and effective usage so teams can make quick, educated decisions about their businesses.
Thanks to remote work and the digital nature of business today, finding modern ways to manage and store data is a need prolific across every industry, not just technology companies. I’ve written about this trend previously, but due to the lack of homogeneity among enterprises and the data customers work with, teams have a newfound desire to experiment, bifurcate, and try new tactics with how they store their data.
Given this more creative approach, the market continues to grow in size for both challengers and legacy companies to offer new purpose-built solutions for specific use cases. A good example is MotherDuck, a new serverless cloud version of open source DuckDB being developed by two of the lead engineers who originally built Google’s BigQuery. The company is aiming to deliver customers the same SQLite equivalent for analytical workloads, but without the parallel requirement of managing infrastructure on the back end. The ultimate goal is to offer the data community a cloud version of the faster, embedded analytics datastore.
In addition to more industries needing upleveled tooling, additional (and often less technical) teams are using databases to execute their jobs. The diversification of users touching the database is presenting an opportunity for startups that lower the technical barrier of usage, which is seemingly becoming table stakes for new offerings. For example, many new database companies have enabled API-like experiences that allow users to access databases while foregoing native programming languages. This innovation in tooling can meaningfully expand the number of addressable users within an enterprise from beyond the technical teams. As databases begin to behave more like applications, everyday users can access systems they need without having developed technical expertise. With touchpoints to a wider array of users, these databases may have a more straightforward path to Snowflake-like ubiquity.
Many New Startups Are Being Born from Hyperscaler DNA
Companies’ willingness to experiment with newly developed database solutions is encouraging for startups in the sector, as fighting (and winning) against Google and AWS is no easy feat. While the big three cloud providers are aggregating new databases into wholly managed offerings, newer startups can rise to the challenge of grabbing customer mindshare by understanding what enamors individual user groups while simultaneously offering a clear ROI for enterprises.
What’s even more telling is how industry veterans from within hyperscaler organizations are turning elsewhere to build new, customer-oriented database solutions from the ground up. Several former AWS, Google, and Microsoft employees have left their positions at the large tech companies with the mission of building new database technologies that appeal more specifically to the user base - not corporate IT departments.
Example Databases born from Hyperscaler DNA
MotherDuck: The above MotherDuck was founded by Jordan Tigani and Tino Tereshko, former lead engineers on Google’s BigQuery. Jordan - who was also CPO at Singlestore - and Tino understandably have a detailed perspective as to where other databases, Google in particular, fall short in user experience.
PlanetScale: The company was founded by former Google and Youtube employees Jiten Vaidya and Sugu Sougoumarane. Jiten and Sugu recognized where popular databases like MySQL fail and how companies consequently struggle to scale relational databases. PlanetScale was conceived to circumvent that tradeoff between going serverless and building more complex applications. PlanetScale uses Vitess (which the two founders developed while at Google) to power the tool’s ability to scale MySQL horizontally to thousands of machines without sacrificing performance.
Neon: The company was founded by Nikita Shamgunov - a former Senior Engineer at Microsoft and the founder of MemSQL (eventually SingleStore). Neon is building a cloud serverless Postgres that can scale compute and storage dynamically. By leveraging this model with Postgres plus a new storage engine, Neon aims to maintain high consistency while providing a scalable, high-performance database.
Dragonfly: The company was founded by Oded Poncz and Roman Gershman - former engineers at Google and AWS. Dragonfly is a modern replacement for Redis and Memcached. The database’s architecture vertically scales to support any workload while only limited by physical properties of the supporting hardware. The company’s mission is to build a well-designed, ultra-fast, cost-efficient in-memory datastore for cloud workloads that takes advantage of today’s latest advancements in hardware.
So What Next?
While the above startups are just a few examples of next generation databases, there are several others that have sprouted from within the data ecosystem that are quickly attracting attention. With this speed of innovation in database technology, it is likely that we will continue to see both continued new entrants and failures (as we have seen over the last few decades).
As we see that continued evolution in market, I expect a few things to take place in the coming years:
Commoditization of Serverless: As more startups build “Serverless for…” versions of legacy databases, we will begin to perceive serverless as table stakes. While I certainly get excited about cloud-based storage with abstracted and automated ops, consumption-based billing, distributed architecture and a fantastic SQL API, it feels like serverless is no longer a unique competitive advantage for today’s offerings.
Newer efforts to replace SQL: New databases will continue trying to unseat SQL as the mainstay query language in order to elicit strong customer attachment to their own databases. While we’ve seen several startups fail in this attempt, we are also seeing startups like SurrealDB and EdgeDB build tremendous early momentum with their own SQL-like query languages.
Database Access will be even more critical: With continued diversification of database usage among enterprises and a wider array of users (both technical and non-technical), database access tools and integrated development environments (IDEs) will serve increasingly critical purposes. Once users are happy with these tools, they will be hard-pressed to switch.
Cloud Database Spend will require optimization: Very similar to what we’ve seen in cloud optimization within engineering, cloud databases and data platforms will become exorbitantly expensive. A new cohort of data cloud optimization tools will scale quickly and large companies like Snowflake will be forced to partner with them. Blue Sky is an example of a company making tremendous progress here.
Headline as Data-Obsessed Investors
Each of the above new startups is taking a different approach to building a unique new data standard, but all are working toward a similar goal: build and sustain a tool that users refuse to let go of. Yes, Hypescalers are quickly releasing their own set of next generation product add-ons, but there is no one-size-fits-all database, and there likely never will be. As highly data-driven investors ourselves (we absolutely love data), we at Headline are aware of how important data can be in achieving business outcomes. If you’re building in this space, we’d love to get to know you.