dbt Labs

Thesis

The move to the cloud has created one of the biggest transformations over the past decade, with cloud data warehouses playing a huge role. Before the cloud gained widespread adoption, traditional data teams relied on a process called ETL (Extract, Transform, Load) for moving data into a data warehouse to allow an organization to generate insights. ETL is a time-intensive and manual process by which raw data is extracted from its source, transformed and loaded into a data warehouse. In particular, the “transformation” stage required data engineers to write a lot of SQL scripts to transform data before it could be loaded. There was also a lack of technical data experts capable of pulling raw data and generating insights.

Source: Open Source Data Stack Conference

As the cloud data stack started gaining traction, the introduction of cheap cloud storage via a modern data warehouse enabled a better process called ELT (Extract, Load, Transform). Teams can store and scale large amounts of data more cost efficiently, and can first focus on loading data and then transform the data after. dbt Labs is a data modeling tool that supports an automated ‘Transform’ step in the ELT pipeline by allowing data teams to generate insights from raw data. dbt combines modular SQL with software engineering best practices to make data transformation reliable, automated and faster for data teams in the cloud era.

dbt Labs has emerged as a standard for managing the complex transformation of large volumes of raw data into valuable insights. Beyond that, dbt Labs is a growing community of users built around an open-source tool capable of improving the data team experience by solving an integral part of the modern data stack.

Founding Story

The company’s founding story begins in 2016 with a data analytics consulting shop founded by Tristan Handy, Drew Banin, and Connor McArthur called Fishtown Analytics. Fishtown’s main focus was helping series A and B venture-funded companies navigate the data tooling landscape and implement advanced data analytics into their workflows. As a side gig, Tristan and his team at Fishtown also began building an open-source project called dbt (data build tool).

As Fishtown continued to use dbt in 100% of its client engagements and shared it as open-source code, the community of data teams benefiting from the dbt project grew exponentially. By 2020 dbt’s community became so large that the Fishtown team decided it was time to dedicate its full capabilities to serving the dbt community. To reflect this shift in focus, In 2021, Fishtown Analytics changed their name to dbt Labs.

Prior to becoming the CEO of dbt Labs, Tristan Handy worked in data analytics. As a result, Tristan was well aware of the friction that existed in actually generating insights from large volumes of raw data. Additionally, Tristan became aware of the tension between data analysts and data engineers as analysts did not possess the necessary skills to dig into the data on their own. These experiences inspired Tristan to create dbt labs. The other co-founders include Connor McArthur who serves as the Chief Technology Officer (CTO) and Drew Banin, dbt’s third co-founder, who stepped down as Chief Product Officer in February 2022.

Product

Today’s modern stack has become increasingly complex and multi-faceted due to the power and extensiveness of the cloud. dbt’s products primarily play a role within the “Extract, Load & Transform” component of the data production pipeline. It works closely with everything that happens in the data warehouse.

Source: Emergence

Once data is extracted from its source (Google Sheets, Amazon S3, etc.) companies will implement a tool like Fivetran to extract the data and load it into a data warehouse like Snowflake or Databricks. A customer then installs dbt to help transform that raw data into a structured and transformed format that is ready to be used for business analytics.

Source: Modern approach to DataOps using DBT

dbt Core

This is the open-source version available for any customer to adjust the code to their needs, and add new features and functionality. dbt Core includes three core use cases:

dbt For Modeling: This is a granular solution that offers full SQL features for teams that want to use SQL to analyze large datasets and debug code within the data warehouse.
dbt For Data Testing: This solution offers customers the ability to routinely conduct tests to ensure data is being transformed correctly. Teams are able to use this feature to deploy data at scale using continuous integration (CI) and continuous deployment (CD)
dbt For Data Documentation: This solution ensures that all data and information that flows in and out of the data warehouse are properly documented. It ensures there is clear data lineage and information can be easily traced.

dbt Cloud

This is dbt Labs’ proprietary and commercial offering. The main feature of dbt Cloud is a browser-based IDE that allows companies to build, run, test, and version control with dbt. The cloud solution is a hosted service that helps data analysts and engineers to put data into production for dbt deployments. It comes equipped with turnkey support for scheduling jobs, CI/CD, serving documentation, monitoring, and alerting. The cloud solution has two key offerings:

dbt Cloud Enterprise: This is a self-hosted cloud where dbt Labs provides all the infrastructure and the customer only has to pay for the solution.
dbt Cloud Integrations: The integration deployment models fall into two categories: Multi-Tenant and Single Tenant. These deployments are hosted on infrastructure managed by dbt Labs. Both models leverage AWS infrastructure.

Market

Customers

Any company that wants to implement data analytics and machine learning with a data team and a cloud data warehouse is a good candidate to use dbt. Out of the existing 1,800+ paying customers of dbt, their customers range from one member to 1,000+ analysts. If dbt finds success with one team in the organization, the tool is likely to naturally be adopted by other data teams improving processes across the board, and expand dbt’s footprint inside a given company.

dbt and its open-source community have played a key in popularizing the role of the Analytics Engineer. This is a type of engineer that brings together the software engineering and data analyst workflows within a data pipeline. These folks work with data and prepare it for analysis. Analytics engineers perform the work of data engineers to sync data to the data warehouse before it is transformed.

As companies continues to become increasingly data-driven, a wide range of industries will have data to more effectively transform. JetBlue is an example of how the increasing popularity of cloud data warehouses is enabling dbt to attract customers across multiple industries. The company has already had some success in industries such as oil & gas, banking and financial services, and healthcare. JetBlue had historically used legacy, on-premise data warehouses and transformation tools. JetBlue dealt with a data engineering bottleneck where not enough people were able to contribute to the data transformation process. In its search for a solution, JetBlue adopted Snowflake as its cloud data warehouse and dbt as its transformation tool.

dbt enables multiple stakeholders within an organization’s data team, including business analysts, can engage with their data more effectively. Non-technical members are able to conduct more analysis, and by engaging multiple types of users dbt extends their addressable customer base.

Market Size

The rise of the cloud data warehouse over the past decade contributed to the rise in popularity of companies like dbt Labs. The cloud data warehouse market is expected to continue growing as more companies flock to the cloud. In 2021, ~50% of enterprise data was stored using cloud data warehouses. Cloud data warehouse spend in 2021 grew 24% YoY to $383 billion. Julia Schottenstein, product manager at dbt Labs, stated that dbt believes data transformation and the semantic layer can easily be 15%+ of current warehouse spend, a $50 billion opportunity.

Competition

As companies move to the cloud and demand for deriving insights from data increases, a number of competitors have emerged within the data stack. dbt competes primarily against in-house data solutions, large cloud providers and other emerging data transformation tools. The most basic aspect of competition for dbt are the many in-house solutions that data teams build to perform the data transformation and orchestration internally.

Large data warehouse providers, such as Snowflake and the cloud providers, have an opportunity to recreate dbt’s products. That risk is even more poignant given dbt’s open-source offering. Companies like Fivetran primarily focus on the Extract and Load stages, but could offer a dbt runner with similar data transformation capabilities. dbt’s core differentiator is its emphasis on data transformation at scale, compared to other companies that have transformation as another point solution. Julia Schottenstein, product manager at dbt Labs, describes the advantage dbt has over Extract/Load companies:

“The other reason why people buy dbt Cloud instead of using a dbt runner (Open Source) in a Fivetran or another EL product is because we offer a far richer overall product experience. The people who do transformation work are often different from those who do set up the extract load flow. So, it's a separation of tools that match a separation of responsibilities. The transformation layer needs to be closer to the business needs versus the pipes of data movement, which is what Fivetran or the EL provider owns.”

Within the transformation layer, there are a handful of companies offering a similar product to dbt, including Matillion, Datameer, and Mozart Data. Matillion particularly handles transformation with a GUI product within the data warehouse. dbt’s CEO, Tristan Handy, does not currently view them as meaningful competitors given dbt’s user base is greater than that of all the transformation alternatives combined, largely driven by dbt’s open-source community.

“We got [to data transformation] early on, and we were open-source since the very beginning. Open-source is very challenging to compete with because it is free. There is also this dynamic where the community ends up not only being a source of growth but a source of product improvement. Every time we release a new version of dbt, there are a dozen plus members of the community who literally contributed code to that release.”

Tristan alludes to a unique open-source flywheel. The product is free, which drives more developers to more easily contribute to the project, which accelerates the growth of the community of developer advocates and customers using the product. Databrick’s open source strategy of offering all of its technology within the data stack is an example of a company that has benefited from this flywheel.

Business Model

dbt Labs employs both an open-source and SaaS business model to complement its two main products: dbt Core and dbt Cloud. Together, the open-source and SaaS offerings create an open core model. The “core” product offers a feature-limited version of the software as free and open source, while the company also offers “commercial” versions in the form of proprietary, subscription-based software. Similar to the freemium model, the main goal of the open-core model is to monetize commercially produced, free-to-use open-source software by offering enhanced proprietary features at an additional price. The business model closely aligns with their product suite:

dbt Core is free to use under an Apache License. As with any open source company, the number of active companies using the dbt Core product vastly outweighs the number of companies paying for dbt Cloud

dbt Cloud is a proprietary and commercial offering which operates under a SaaS model. As the “premium” version, dbt Cloud provides enhanced features, especially for enterprise-level data teams. As the “premium” version, dbt Cloud provides enhanced features, especially for enterprise-level data teams. The Cloud offers three pricing tiers including Developer, Team, and Enterprise.

Developer Plan: a free, single-seat plan tailored toward individuals who want to learn what dbt can do.
Team Plan: geared toward collaboration and has additional functionality including access to dbt’s API; offered at $50 per developer seat per month.
Enterprise Plan: advanced features and custom pricing packages best suited for large customers with security, compliance, and governance needs.

Traction

One of the core metrics to measure dbt’s growth is to evaluate the number of active companies using the open-source analytics product. In 2017, one year after the company was founded (then called Fishtown Analytics), there were 50 companies actively using the dbt product. At the time of the company’s Series A in 2020, the number of active companies using both the open source reached ~1,700 companies.

As of March 2022, there were 9,000 companies using open-source dbt. Although not all active companies on the platform are paying customers, dbt’s paying customer base has also gained traction as of late. In February 2022 when dbt announced its Series D round, 1,800 of the 9,000 customers were paying for dbt. Throughout 2021, dbt tripled its paying customer base, beginning the year with ~600 and ending the year with ~1,800.

While dbt’s CEO, Tristan Handy, declined to disclose their revenue, he confirmed they had grown revenue by 6x in 2021. Other sources indicated revenue was in the double-digit millions. During 2021, dbt grew their team from ~50 to ~200 people.

Source: dbt

Within the Data Analytics group, Dbt Labs has the highest Net Sentiment score relative to peers with 5% mind share of the sector. Net sentiment represents the % of respondents allocating spending or currently evaluating the company over competitors. DBT ranks very high on spending priority amongst CTOs and executives.

Source: ETR

Key Opportunities

The Semantic Layer of the Modern Data Stack

The modern data stack that governs the end-to-end flow of data from extraction to insights has evolved over the past decade with the move to the cloud. dbt believes the next big trend within the data stack is the semantic layer. Tristan Handy describes this opportunity, stating:

”On our move-the-ecosystem-forward initiatives, there are a bunch of irons in the fire. The biggest one is something we’re calling the semantic layer, which is a brand new way for Business Intelligence (BI) and analytics tools to access a single set of business concepts (metrics, entities, and more).”

To visualize this fundamental architectural change to the modern data stack, the following picture depicts the data flow in the current modern data stack.

Source: dbt

The main issue with this flow of data is that organizations, especially large ones with extensive data sources, have a wide variety of tools for different projects catering to different users. This leads to the duplication of tools accessing different copies of data from a company’s warehouse. dbt’s new semantic layer fixes these issues. dbt’s code sits in between to utilize any existing programming construct that dbt authors use. This layer will solve the “single source of truth” problem and create a flow of data depicted below where dbt is able to orchestrate and coordinate the flow of data.

Source: dbt

Leveraging the Snowflake Partnership

dbt is tightly coupled with Snowflake within the data warehouse. As a result, dbt drives a significant amount of consumption within Snowflake. Jamin Ball, a partner at Altimeter who invested in dbt, believes this is going to be a key driver for dbt Labs.

"dbt and Snowflake have a very symbiotic relationship given dbt customers drive Snowflake compute simply by using dbt. While Snowflake may be a competitor in the future, they realize dbt's success translates to Snowflake success which creates a win-win environment."

Within the Snowflake ecosystem, dbt Labs is also the company with the highest net sentiment among customers according to one survey (net sentiment being the % of respondents allocating spend or currently evaluating the company over competitors). Over time, there will be opportunities for more synergies and product partnerships that could be struck between both companies.

Source: ETR

Technology Partner Program

dbt’s user community has been a key part of the company’s strategy. One recent example of dbt’s focus on community is its Technology Partner Program launched in August 2022. The program involves partnerships with consulting service providers, and technologies that share dbt’s viewpoint of advancing the modern data stack. With more than 50 partners in the program, data practitioners and members of the dbt community will be able to gain more value from dbt by extending the product’s capabilities through integrations.

An example of an existing dbt technology partnership is with Monte Carlo, a data observability platform. Barr Moses, CEO of Monte Carlo, explains how the partnership with dbt brings increased functionality to users:

“Monte Carlo works hand-in-hand with dbt to bring improved data reliability to joint users, solving an important pain point: data downtime. Partnering with dbt Labs allows analytics engineering teams to deliver more trustworthy data products by pairing end-to-end data observability with robust testing.”

Key Risks

Threat From Cloud Providers

The cloud data warehouse market is dominated by a handful of companies including AWS, Azure, and Snowflake. For dbt, CEO Tristan Handy believes the major cloud data platforms can be expected to launch some form of managed dbt service in the near future. dbt drives traffic to several of the major cloud providers as they access raw data from the data warehouse. Handy emphasizes two key aspects of dbt’s relationship with the cloud providers: (1) dbt drives billions of dollars in spend to the major cloud data platforms, and (2) the major cloud platforms have a history of selling managed versions of open source software. Handy speaks to a possible threat that could arise from the cloud providers:

“Put those two things together and our expectation is that at least one, if not more, of the cloud platforms, will launch some sort of managed dbt service in the coming year. This just seems like an inevitability as the community grows and exerts ever more gravity on the ecosystem.”

Technological Disruption

One of the large cloud data providers could develop a technology that reduces or eliminates the need for the transformation component of the data pipeline. For example, Snowflake recently launched a product called Unistore. Snowflake’s Unistore is a new workload that allows customers to unite transactional and analytical data together in a single platform. Unistore was created to reduce the movement of data between systems which would reduce the need for using external “Extract & Load” tools. Snowflake could decide to extend their product deeper into data transformation.

Valuation

Since the company’s Series A in April 2022, dbt had raised a total of $414 million after announcing its $222 million Series D in February 2022 at a $4.2 billion valuation. That round was led by existing investor Altimeter, with participation from Amplify Partners, Andreessen Horowitz, and Sequoia. Other notable investors include Salesforce Ventures, as well as Databricks and Snowflake.

The increasing universality of data, and the market opportunity in managing it, has created investor fervor in the category. In 2021, data analytics and infrastructure companies raised over $8.5 billion combined in funding. Companies like Databricks raised over $2 billion alone, and companies like Clickhouse and Airbyte raised 2-3+ rounds of funding in the same year. In terms of valuations, companies like Monte Carlo raised at a $1.6 billion valuation with $5-7 million of revenue (a 228-320x revenue multiple). Airbyte raised at $1.5 billion on less than $1 million of revenue.

In the public markets data companies like Snowflake and MongoDB have seen their revenue multiples collapse from a high of 94x and 41x respectively to their current levels of 31.9x and 11.5x.

Source: Koyfin

For companies that have raised at multi-billion valuations on single to low-double digit millions of revenue, they are now competing in a very crowded space where current markets may not be as promising as their current valuations. Time will tell which companies grow into those valuations and which struggle to live up to the weight of them.

Summary

dbt has become a major player in the modern data stack and has grown an active and sizable community of users who are both engaging with the open source dbt project and becoming paying customers. As data continues to rapidly grow and evolve, dbt will benefit from that major tailwind. Success going forward for dbt will likely come from deepening their presence within the broader data workflow. Thanks to Jamin Ball for sharing his thoughts on this piece.

Thesis

Source: Open Source Data Stack Conference

Founding Story