Thesis
The rapid adoption and acceleration of cloud computing across all industries has led to an explosion of data-generating SaaS applications. The average enterprise deployed 187 SaaS apps in 2021. Consequently, enterprise data became fragmented and siloed behind the walled gardens of point solutions and platforms alike. Data fragmentation creates extra hurdles for organizations seeking to combine datasets from multiple sources and analyze them together.
Traditionally, enterprises overcame data silos by tasking data engineering teams to manually create custom data connectors for each data source. This process is inherently time-consuming and costly. Fivetran stepped in to provide automated, pre-built data connectors that enable organizations to pull data from multiple sources and load them into a central warehouse which acts as the single source for the enterprise data.
Fivetran is among several well-funded startups aiming to tackle the “data pipeline problem” through automation and has so far enjoyed multiple secular trends including the explosion of data volumes, the rise of cost-effective cloud data warehouses, and the ubiquity of SaaS applications. However, some in the industry believe that the data integration market could become a winner-take-all where the leader owns 80% of the shares. This presents Fivetran with both the opportunity to strengthen its product offerings and force its way to market leadership, but also the risk of losing the race to the top position.
Founding Story
Fivetran was founded in 2013 by George Fraser (CEO) & Taylor Brown (COO). They initially developed a business intelligence tool that captured and analyzed data. This got them accepted to Y Combinator in the winter of 2013. At the same time, they also built an internal software that connected their BI tool to Amazon Redshift. One of their early customers noticed this behind-the-scenes software in 2014, and requested the founders build them a similar Salesforce-Redshift connector. This was a eureka moment for Fraser and Brown. They realized that data integration was an unsolved problem and saw the opportunity to build fully automated data pipelines for data engineers. Fraser and Brown ditched their original idea, redesigned their website, and rebranded as a data pipeline builder. The company fully pivoted to the data integration space in 2015 and has since then developed more than 200 data connectors.
Product
Data is at the core of every business decision whether it’s predicting customer behavior, growing sales, identifying new business opportunities, or improving customer service. Getting data into a central hub has always been a challenge for organizations. Traditionally, companies conducted data integration through the ETL (Extract, Transform, Load) methodology. With ETL, data engineers would first develop their own custom-built data pipelines to extract raw data from various sources. They would then conduct data cleaning, merging, and manipulation and often shrink the data volume. They would finally load that data into the data warehouses for storage and modeling. This process is highly manual, time-consuming, and requires regular maintenance. Moreover, the rise of modern data warehousing technologies, coupled with the plummeting cost of compute and storage has led to the emergence of a new data integration approach: ELT.
Source: Striim
ELT workflows enable organizations to extract, or stream, data from multiple sources, and move it to the storage destination in its raw format without altering or reducing its volume. Data transformation is then performed as the final step of the journey. Companies like Fivetran are automating the first two stages of the ELT pipeline through a series of pre-built data connectors with the aim of eliminating manual ETL coding. Fivetran’s main products and capabilities include the data connectors themselves, a replication database, transformation, and embedding.
Data Connectors
Fivetran offers a catalogue of data integrations that extract data from the production sources and move them into the storage destinations. There are two types of Fivetran connectors: Pull and Push. Pull connectors are tasked to actively retrieve and download data from the sources while in Push connectors, source systems send the data to Fivetran. Once either type of connector extracts the data, Fivetran then automatically performs light data preparation tasks such as normalizing and de-duplicating. Data is finally loaded to the target destination for transformation and storage.
Data Connectors are the flagship product of Fivetran, and as of November 2022, the company offers over 200 connections to applications like Google Sheets, Airtable, Zendesk, Salesforce, LinkedIn, Paypal, Mailchimp, Atlassian Jira, and Shopify.
CDC Database Replication
Data replication is the process of creating several copies of the same data across multiple databases. Change Data Capture (CDC) replication is a technique that involves capturing transactional changes as they happen in the source database and applying them to the target database. Fivetran’s CDC product offers real-time database monitoring and only transfers data that has changed to the connected storage. This solution is ideal for large enterprises moving high volumes of data from one database to another, consolidating multiple databases, migrating from on-premise to the cloud, or from one cloud vendor to another. The company started offering this solution after acquiring HVR, a data replication software, for $700 million in September 2021.
Transformation
Fivetran also offers a Transformation service that’s powered by dbt Labs’ open-source framework, dbt Core. With this solution, users can conduct data cleaning, transformation, modeling and documentation tasks within the same Fivetran environment.
Embedded Data Pipelines
Fivetran’s data pipeline embedding product, also known as Powered by Fivetran (PBF), is an API solution designed for and sold to B2B analytics and data insight vendors. It embeds Fivetran’s automated connections and replication capabilities into the vendor’s web applications and allows users to connect their data sources to the vendor’s platform.
Market
Customer
Prior to the HVR acquisition, Fivetran customers were predominantly comprised of startups and small-to-midsized companies. It was also limited to data connection use cases. The acquisition of HVR, which offered data replication tools to enterprises moving large datasets, opened a newer enterprise market for Fivetran. In terms of persona, Fivetran products are largely offered to data engineers and analysts tasked to build ELT pipelines. The company has more than 4,000 customers as of November 2022 including Okta, DocuSign, Intercom, Square, Lionsgate, Condé Nast, Databricks, JetBlue, and Coupa.
Market Size
The data integration market is projected to grow at a 5-year CAGR of 11% from $11.6 billion in 2021 to $19.6 billion in 2026. Further, manual data integration tasks are expected to be reduced by 50% by 2024 through the adoption of automated data integration tools. That market growth is driven by a number of key factors:
The explosion of data volumes
The emergence of cloud-based data storage
The ubiquity of SaaS applications
The explosion of data volumes: The amount of data created globally in the last decade has increased from 2 zettabytes in 2010 to 64 zettabytes in 2020 and is expected to reach 181 zettabytes in 2025. This presents new opportunities for companies to tap into their data and glean business insights. However, the size and complexity of data resulted in the need for automated and faster data integration tools, propelling the growth of this market.
Source: Statista
Emergence of cloud-based data storage: Modern data architectures such as cloud-based data warehouses and lakehouses have risen to prominence in the 2010s and enticed organizations with cost-effective and faster repositories that can store large data volumes of all types and formats. 50% of enterprise data is currently stored in cloud-based data warehouses, and is expected to be higher in the years ahead. Data connectors play a key role in extracting data from multiple sources and ingesting it into the data warehouses for storage, transformations, performing BI, advanced analytics, and ML tasks.
Ubiquity of SaaS applications: As part of the digital transformation movement, organizations have been moving away from self-managed, enterprise technologies for fully-managed, cloud-delivered SaaS applications, point solutions, and infrastructure. Companies are increasingly relying on connectors to pull data from hundreds of SaaS tools into a single cloud storage hub.
Competition
The data integration market is undoubtedly crowded. Fivetran faces competition from fellow startups, and established data integration platforms, as well as cloud providers. The company directly competes with established players like Informatica and Talend, who initially focused on data movement and ingestion for on-premise databases and warehouses, but have since developed data connectors for cloud workloads. Both companies primarily sell to enterprises requiring data integration solutions that can support complex tasks involving multidirectional data.
Fivetran is also directly competing with Airbyte, which offers open-source data connection alternatives. Airbyte currently markets over 160 connectors and plans to reach 500 by the end of 2022. Matillion, Qlik, CloverDX, and Denedo also offer data pipeline solutions with similar features. Public cloud providers provide their own data integration tools too, including Amazon’s AWS Glue, Microsoft’s Azure Data Factory (ADF), and Google’s Data Fusion.
While there is little product differentiation among the data connector tools of competing vendors, HVR and the CDC replication software could give an edge to Fivetran as this expands the product use cases, offer new upselling opportunities, and enable deeper integration into customer data ecosystems.
Business Model
Fivetran employs a consumption-based pricing model which it transitioned to in February 2020. In this model, customers are only charged for the amount of data extracted and loaded, based on the number of monthly active rows. The cost per row decreases as monthly consumption increases. Customers are offered a 14-day free trial before charging for usage. The company has 4 pricing plans: Starter, Standard, Enterprise, and Business Critical.
Source: Fivetran
Traction
Fivetran generated $1.9 million in revenue in 2017 before tripling in 2018. In the 12 months ending February 2020, revenue grew 129% and doubled to $34.3 million for the full year 2020. Revenue further grew by 141% to $83 million in 2021. However, 2021 growth is significantly driven by the HVR acquisition which generated about $30 million in revenue. Fivetran had $200 million in cash, as of August 2022, and forecasts to reach $189 million in revenue for the fiscal year ending January 2023. Fivetran nearly doubled its customer base in 2018 from 279 to 525. It then doubled the number of customers in 2020 before growing by 75% in 2021.
Valuation
Fivetran raised a total of $728 million in funding to date from investors including Andreessen Horowitz (a16z), General Catalyst, CEAS Investments, Matrix Partners, and D1 Capital Partners. The last round was a Series D in September 2021 led by a16z at a post-money valuation of $5.6 billion.
Fivetran’s 2021 revenue of $83 million puts it at a revenue multiple of ~67x. This is below Airbyte’s 1,500x 2021 revenue multiple, valued at $1.5 billion on less than $1 million revenue, but well above its publicly traded competitor, Informatica which trades at 4.1x LTM revenue.
Comparable high-growth data software companies like Snowflake, MongoDB, and Confluent have also all seen their revenue multiples sharply compress from the highs of November 2021 and trade at a deep discount to Fivetran.
Source: Koyfin
Key Opportunities
Reverse ETL
Fivetran currently focuses on extracting data from the production sources and loading it to a central storage system, or replicating data from one database to another. This process is unidirectional and ends once data arrives in the central repository. However, organizations often need to move the combined and transformed data from the warehouses to downstream business tools, such as SaaS applications and customer databases to power workflows like marketing campaigns, finance, and customer support. This last-mile journey of migrating data out of storage is known as Reverse ETL and it also involves data connectors. There are several startups, including Hightouch and Census, that offer stand-alone Reverse ETL solutions.
However, due to the lack of end-to-end vendors in the market, customers will often stitch together an ETL tool, like Fivetran, with a Reverse ETL solution, like Hightouch, for a full breadth of data movement capabilities. Fivetran has the opportunity to build a product for the last mile or acquire a player in the Reverse ETL space and become a truly bi-directional, end-to-end data pipeline provider.
Source: Airbyte
Key Risks
Zero-Copy Data Sharing
Several high-profile SaaS platforms have begun providing their own native integration tools to data warehouses. For instance, Salesforce announced in September 2022 that it will offer a real-time, direct connection to Snowflake without requiring a third-party data connector. Other large applications like Stripe, Zendesk, and Hubspot could follow suit in the future, making third-party connectors redundant, and consequently denting Fivetran’s enterprise business.
Competition From Public Cloud Providers
Data integration has already attracted a dozen well-funded startups that offers similar products, but the biggest threat to Fivetran likely comes from the data warehouse and public cloud providers. Amazon AWS, Google Cloud, and Microsoft Azure already offer competing solutions for their storage platforms. Although they currently lack key capabilities such as integrations for on-prem databases and offer limited support, they could improve their solutions in the future to a point where intermediaries like Fivetran are no longer needed.
Summary
Fivetran continues to rapidly grow through product innovation and strategic acquisitions and deep integrations to complementary products like dbt Labs. It also benefited from several strong industry trends that are expected to sustain in the long run, despite near-to-medium term macro headwinds. However, the company faces significant competition, an existential threat from SaaS platforms offering their own direct and real-time connections to data warehouses, and a rich valuation that could limit its potential M&A suitors amid growing consolidation trends in the broader software market.