OneSchema is an embeddable spreadsheet importer and validator that can automate the process of moving CSV files with up to 20 million rows, while also validating the data to ensure that formatting is correct and the data is complete. According to the company, OneSchema can automate 90% of the workflow for mapping, transforming, and validating CSV files.

Founding Date

Jan 1, 2021

Headquarters

San Francisco, California

Total Funding

$ 6M

Stage

seed

Employees

11-50

Careers at OneSchema

Memo

Updated

June 21, 2024

Reading Time

15 min

Thesis

Increasingly, data is everywhere. In every device, every online interaction, every financial transaction. In 2006, British mathematician Clive Humby coined the phrase, “data is the new oil.” At that point the world was generating 161 billion gigabytes of data. In 2021, the number had grown 500x to 79 zettabytes, and is expected to continue growing to 150 zettabytes by 2025.

While data may be the new oil, we still need infrastructure to transport and use the oil. Extract, transform, and load (ETL) is the methodology for “combining data from multiple sources into a large, central repository.” While there are large established ETL platforms like Informatica PowerCenter and Talend Open Studio, some would joke that Microsoft Excel is the most common ETL tool in the world. People can use Excel to manipulate comma-separated value (CSV) files; a text-file format that uses “commas to separate values, and newlines to separate records.”

While data transfer methods, like APIs, have become increasingly common, the reality is that a large portion of data transfers still happen through CSVs. One former executive from a Fortune 500 company shared with Contrary Research that large companies will often be processing over 100 million CSV files per year, often spending tens of millions of dollars to manage the data. However, the process of manually validating and cleaning that data can be time-intensive and error-prone.

That’s where OneSchema* comes in. OneSchema is an embeddable spreadsheet importer and validator that can automate the process of moving CSV files with up to 20 million rows, while also validating the data to ensure that formatting is correct and the data is complete. According to the company, OneSchema can automate 90% of the workflow for mapping, transforming, and validating CSV files.

Weekly Newsletter

Subscribe to the Research Rundown

Founding Story

OneSchema was founded by Christina Gilbert and Andrew Luo (CEO) in February 2021.

Prior to founding the company, Gilbert spent four years as a product manager at Google. Meanwhile, Luo had previously been an early software engineer at Affinity, an investor CRM product with large investors as customers, such as Mass Mutual, Swiss Re, Bain Capital Ventures, and others.

His experience building at Affinity required helping customers manage large amounts of data, often coming from other CRMs in the form of CSVs. In a May 2024 interview with Contrary Research, Luo described the experience this way:

“At Affinity, when we built an importer we thought it would be simple. A CSV parser in front of a SQL database. But it turns out that every data type in the world, whether its a number, an address, a phone number, or anything, can have 20+ different variations. Explaining the difference between October 10th, 2023 and 10/10/23 requires explanations that have exponentially more code to explain why something is wrong. So handling edge cases requires months of engineering time.”

While Affinity expected the project to take two months, it ended up taking two years. A key part of that process was building an engine to understand data that wasn’t normalized. The process of effectively normalizing data for ingestion required a surprising amount of engineering resources. That complexity led Luo and Gilbert to build OneSchema to solve that problem. In particular, the focus would be on enabling non-technical users to manage the data transformation and ingestion process without requiring engineering resources.

Product

Source: OneSchema

OneSchema’s core value proposition is to manage the process of data exchange via CSVs more easily. OneSchema CEO Andrew Luo, in May 2024 interview with Contrary Research, described the product surface area this way:

“As the standard for cross-company data exchange, there will be one-time uploads of CSVs as well as recurring uploads. For one-time uploads, OneSchema’s Embeddable Importer wraps a button inside your web application, and makes it possible to automatically clean spreadsheets that get uploaded. For companies that frequently receive CSVs through SFTP, email, or API, that’s where File Feeds comes in. Non-technical users can record a set of transformations and replay them on all subsequent spreadsheet integrations.”

Embeddable Importer

Source: OneSchema

For use cases that require one-off CSV uploads, this process typically requires a CSV to be imported and then processed. The process for validating the data in a CSV manually can be time-intensive and error-prone. Products like self-serve CRMs, restaurant POS, and copywriting tools will require their customers to upload relevant CSVs. These could include data like contact information, menu SKUs, or outreach targets.

Traditionally, building an importer from scratch will fail to address a wide variety of edge cases like data formats, cleanliness, or missing data. This can lead to onboarding delays, customer churn, increased number of customer service tickets, and negative user experience.

OneSchema’s Embeddable Importer is a widget installed in a customer’s application that is connected to OneSchema’s data validation engine, and can more easily ingest CSVs while controlling for a wide variety of edge cases in the data ingestion process. The Importer can handle files up to 4 GB in one second and can manage spreadsheets with up to 20 million rows. The Importer also enables users to more quickly clean their data in bulk with tools like autofix, find-and-replace, and others.

FileFeeds

Source: OneSchema

In some use cases, data is frequently being transferred via CSV. For example, transaction files from an ERP, inventory files from a POS, or employee files from an HRIS. Often, this kind of data may need to be transferred weekly, or even daily. Often managing these data transfers requires engineering resources to write Python scripts. However, these scripts can be brittle, and can break either due to volume or complexity. Fixing that problem requires engineering resources each time.

OneSchema FileFeeds enables customers to build workflows for specific data flows dependent on customer need. OneSchema claims that its FileFeeds product can “automate 90% of the workflow for mapping, transforming, and validating CSV files received through an API, [Secure File Transfer Protocol (SFTP)] server, or email server.”

Customers can create data pipelines for workflows around specific customer needs. Users can use mappings around data needs, and record transformations around particular spreadsheets. One example may be working with retail data where the software needs 0, 1, and 2 as inputs, but the retailer uploads data with S, M, and L. The user can record that specific transformation, formatting the data into the required format, and then replay that data across every upload that gets received.

Source: OneSchema

In addition to bespoke confirmations, OneSchema also offers a library of standard data transformations like replacing values, adding and splitting columns, and standardizing data formats. OneSchema can also recommend specific transformations based on its estimation of the data issue and the most likely correction to resolve any issues.

Data Normalization Engine

As previously mentioned, one of the core issues with manually building a CSV importer is the lack of capability around edge cases. Each time a new issue arises in a custom-built CSV importer it will likely break the importer, and require engineering resources to fix. OneSchema addresses this limitation by building all of its data transformation products on the foundation of a data normalization engine. Andrew Luo, in a May 2024 interview with Contrary Research, described the product this way:

“The core of OneSchema is the data normalization engine. This is the set of validations and transformations that are built on the knowledge of every spreadsheet OneSchema has ever ingested. This is our unique differentiator given that its no-code; you don’t need engineers to write brittle Python scripts or one-off transformations in code. Someone on your team, even non-technical people, can configure this using a no-code UI without needing an engineer.”

Market

Customer

OneSchema’s target customer is quite broad, including anyone that is moving data via CSV either semi-often or frequently. If “data is the new oil,” that includes almost everyone. In particular, there are two types of customers that feel the pain most poignantly. First, mid-market companies see these data transfers as taking up valuable engineering resources that could be deployed elsewhere. Second, large enterprises use CSVs for a number of security or compliance reasons, as well as other functional issues such as version control or downstream data sharing limitations. However, these efforts often require large teams of data analysts to manage.

In a May 2024 interview with Contrary Research, OneSchema CEO Andrew Luo shared how OneSchema’s target customer has evolved:

“We started with SMB and mid-market up to 1K employees. Product managers use OneSchema to avoid years of cycles building importers. As we move into enterprises like large consulting firms or supply chain companies, what we see is that, rather than being owned by a product manager, these problems are owned by data analysts and professional services teams. They build brittle workflows that scale with headcount.”

Some of OneSchema’s customers include tech companies such as Scale AI, Pave*, Ramp*, Personio, and Airbase. In addition, several of OneSchema’s customers are larger enterprises, such as CreditSafe and Huron Consulting.

Market Size

The market for enterprise data management is estimated to grow from $77 billion in 2020 to $123 billion by 2025. Within OneSchema’s initial target of mid-market software companies there are ~100K potential customers in the US. More broadly, OneSchema has also been able to address use cases in management consulting, ecommerce, insurance, and financial services which broadens the number of potential customers to 3 million.

Competition

As a horizontal tool addressing both mid-market and enterprise customers, OneSchema has several subsets of competitors each with different focuses.

Mid-Market Competitors

Flatfile: Founded in 2018, Flatfile is a “smart API for a file-based data importer.” The company has had similar product set to OneSchema with both an embeddable importer it calls Portal, and collaborative data import workflows, which Flatfile calls Projects. However, Flatfile is reportedly pushing customers to migrate to its new Platform product. This product focuses on a “build-with” strategy, and has become more akin to developer scaffolding now which reflects in the company’s tagline. OneSchema, on the other hand, is meant to be out of the box. Flatfile’s customers are typically mid-market, such as startups like ClickUp. Flatfile has raised nearly $100 million of funding from investors like Tiger Global, Scale Venture Partners, and Workday Ventures.

Osmos: Founded in 2019, Osmos is a data ingestion engine reportedly used by customers like Netflix, Rakuten, and Harman. The company has raised $18 million in funding from investors such as Lightspeed Ventures, CRV, and Pear VC. The company reportedly leverages GPT for auditable data transformations.

CSVBox: CSVBox is a CSV import widget primarily focused on smaller startups. According to the company, it differs from OneSchema because it is less feature rich, and therefore less complex to use than OneSchema. As of June 2024, the company had not announced any funding.

Enterprise Competitors

Dell Boomi: Founded in 2000, Boomi is an enterprise data integration platform that was acquired by DELL in 2010 before subsequently being sold to private equity firms Francisco Partners and TPG in 2021. Boomi’s product provides an integration capability for enterprises to weave together different applications with workflows including data integration, API management, data quality, and workflow automation. The company serves primarily enterprise customers across verticals such as agriculture, government, financial services, retail, utilities, and more. Some example customers include 23andMe, Cornell University, Dropbox, and Linkedin.

MuleSoft: Founded in 2006, MuleSoft is an enterprise data integration platform that was acquired by Salesforce in 2018 for $6.5 billion. Salesforce was focused on leveraging MuleSoft to more effectively integrate Salesforce’s disparate cloud products. The company serves customers such as AT&T, BP, and the Royal Bank of Canada.

Trifacta: Founded in 2012, Trifacta specializes in data wrangling, which involves cleaning, structuring, and enriching raw data into a desired format for analysis. The company reportedly attempts to focus on serving both technical and non-technical users with products across data wrangling, visual data profiling, and data governance. Trifacta was acquired by Alteryx in 2022 for $400 million.

Talend: Founded in 2005, Talend provides a suite of data integration and management tools, including Talend Data Fabric, Talend Data Integration, Talend Data Management, Talend Data Preparation, and Talend Data Stewardship. These products facilitate data integration, transformation, and governance across various sources and environments, supporting both on-prem and cloud deployments. After private equity firm Thoma Bravo acquired business intelligence tool, Qlik, for $3 billion in 2016 the firm then acquired Talend for $2.4 billion in 2021 and merged the two businesses. At the time, Talend had $350 million of revenue and 7K customers, such as ABInBev, AstraZeneca, and Domino’s.

Informatica: Founded in 1993, Informatica offers a range of data integration and management products, including Informatica PowerCenter, Informatica Intelligent Cloud Services, Informatica Data Quality, Informatica MDM, and Informatica Data Integration Hub. These tools support data integration, transformation, quality, governance, and master data management across on-prem and cloud environments. As of June 2024, Informatica had a market cap of $8.6 billion and in 2023 generated revenue of $1.2 billion.

Embedded iPaaS

Workato: Founded in 2013, Workato is an Integration Platform as a Service (iPaaS) provider. The company offers over 1K integrations with business applications, such as Salesforce, Netsuite, Slack, Workday, etc. Workato has customers such as HP, Atlassian, and Adobe and, as of January 2021, Workato claimed to have over 7K customers. As of June 2024, the company had raised $415 million in total funding from investors such as Insight Partners, Altimeter, Battery Ventures, Redpoint, and others.

Tray: Founded in 2012, Tray is a unified iPaaS focused on automating business processes and activating ecosystem integrations. Tray’s product offers both a low-code workflow builder and a Connector Developer Kit to use APIs and SDKs to make integrations. The company has customers such as IBM, Eventbrite, and ThoughtSpot. As of June 2024, the company had raised $150 million of total funding from investors such as True Ventures, Notable Capital, and Spark Capital.

Adjacent Unified API

Beyond companies that are specifically managing comparable data transformations, OneSchema also has adjacent competitors that offer similar, but different offerings. In a May 2024 interview with Contrary Research, OneSchema co-founder Christina Gilbert explained the difference this way:

“With product and engineering teams, the primary companies we run into are unified APIs. Though, often these relationships make more sense as partnerships. We often land with them. They’re better at integrations, and we’re better at spreadsheets.”

Merge: Founded in 2020, Merge is a unified API with a focus on offering specific integrations to enterprise tools across HR, CRM, ticketing, ATS, etc. Where OneSchema focuses on normalizing full data exports via CSV, Merge integrates platforms directly for incremental updates to particular data. As of June 2024, Merge had raised $75 million in funding from investors such as Accel, Addition, NEA, and others.

Finch: Founded in 2020, Finch is an API for connecting to employment systems, such as HR, payroll, and benefits platforms. The company has customers like Secureframe* and Carta across use cases such as tax and compliance, financial insights, and 401k management. As of June 2024, Finch had raised $62 million in total funding from investors such as General Catalyst, Y Combinator, and Menlo Ventures.

Differentiation

Across all of these different competitors there are a number of different stage and sector focuses. For OneSchema, differentiation typically comes down to several key elements, as Christina Gilbert described in a May 2024 interview with Contrary Research:

“In terms of differentiation, there are three main drivers: (1) our validation layer; other players haven’t invested in addressing every edge case that can come up in a CSV, (2) the way you build integrations with OneSchema is truly non-technical users recording transformations on a spreadsheet instead of hiring analysts to build connections, and (3) errors in a file feed can be recovered by a non-technical user.”

Business Model

Source: OneSchema

OneSchema doesn’t publicly disclose its pricing, but in June 2024, the company shared with Contrary Research that the product is priced with a platform fee and consumption-based model. OneSchema’s usage-based metric is per successful file import. On its website, the company outlines three customer segments that vary based on the amount of support, analytics, and compliance robustness.

Traction

As of November 2022, OneSchema had grown to having 50 customers. In 2023, OneSchema was featured in Forbes Rising Stars list as part of the Cloud 100. As of June 2024, OneSchema shared with Contrary Research that the company has 130+ customers across North America, Europe, and Asia. These customers ranged from cloud 100 startups to publicly traded enterprises.

Valuation

As of June 2024, OneSchema has a raised a total of $6.4 million in funding. After participating in Y Combinator in 2021, OneSchema raised a $6.3 million funding round led by General Catalyst with participation from other investors such as Sequoia, Contrary, and Elad Gil. In addition, other investors included angels such as Wade Foster, the CEO of Zapier, Julianna Lamb, the CEO of Stytch*, and Steven Bartel, the CEO of Gem.

Key Opportunities

Enterprise Opportunity

As previously stated, OneSchema’s target customer has evolved from initially only focusing on mid-market companies to then expanding to address enterprise customers. In a May 2024 interview with Contrary Research, Christina Gilbert explained the difference this way:

“For mid-market tech companies, OneSchema saves them a few engineers’ time. But enterprises don’t have engineers. So these massive enterprise companies may have 30 people manually cleaning spreadsheets that OneSchema can replace.”

As OneSchema continues to penetrate into larger enterprises it will be addressing problems that, for some of the largest companies, represent tens of millions of dollars of spend and dozens of people to address manually. As a result, OneSchema will be able to increase the average contract size it can command, and be an even more valuable part of a larger company’s technology stack.

AI Transformations

The data normalization engine that represents the foundation of OneSchema’s product is a universal rules engine that is improved as a result of every CSV OneSchema processes. There is a network effect in that process that enables OneSchema’s product to continue to improve. As that data moat grows, it represents the potential for the company to broaden its use of AI in servicing customers. Andrew Luo, in a May 2024 interview with Contrary Research, explained the opportunity this way:

“We have a great set of labeled data to build an AI transformation set. Because we have these validation capabilities and a really robust set of validation elements, we also have the workflow engine upon which an AI transformation engine on top of spreadsheets would be powerful.”

Key Risks

Executing on Messaging

One of the complexities that OneSchema faces is the ability to address a number of different use cases as a horizontal tool. With use cases across management consulting, ecommerce, insurance, healthcare, and financial services, OneSchema needs to manage the process of balancing unique messaging that is specific enough to resonate for a specific vertical, but not too specific that it brands OneSchema to a specific vertical, like “spreadsheet transformations for healthcare.”

Competitive Landscape

Given the number of different segments and industries OneSchema is able to address, it opens the company up to competition with a number of different competitors. In particular, as OneSchema pushes more deeply into enterprise customers it will run into several 20+ year old companies that have been deeply entrenched in these larger businesses. OneSchema will have to demonstrate enough value to displace these legacy tools.

Weekly Newsletter

Subscribe to the Research Rundown

Summary

OneSchema is an embeddable spreadsheet importer and validator that can automate the process of moving CSV files between organizations. The company was started to solve the complexity that comes from attempting to build an in-house CSV importer. In particular, the product addresses the difficulties in dealing with data edge cases. As a result of being a horizontal data tool, OneSchema finds itself addressing a number of different use cases across management consulting, ecommerce, insurance, healthcare, financial services, and more. While this creates a data advantage for OneSchema’s product to continue to improve, it also opens up the company to a wider variety of competitors. Going forward, OneSchema has the opportunity to establish itself as an enterprise-grade product with a sizable data advantage. As part of that advantage, OneSchema could more effectively leverage AI to service its customers.

*Contrary is an investor in OneSchema, Pave, Ramp, Secureframe, and Stytch through one or more affiliates

Disclosure: Nothing presented within this article is intended to constitute legal, business, investment or tax advice, and under no circumstances should any information provided herein be used or considered as an offer to sell or a solicitation of an offer to buy an interest in any investment fund managed by Contrary LLC (“Contrary”) nor does such information constitute an offer to provide investment advisory services. Information provided reflects Contrary’s views as of a time, whereby such views are subject to change at any point and Contrary shall not be obligated to provide notice of any change. Companies mentioned in this article may be a representative sample of portfolio companies in which Contrary has invested in which the author believes such companies fit the objective criteria stated in commentary, which do not reflect all investments made by Contrary. No assumptions should be made that investments listed above were or will be profitable. Due to various risks and uncertainties, actual events, results or the actual experience may differ materially from those reflected or contemplated in these statements. Nothing contained in this article may be relied upon as a guarantee or assurance as to the future success of any particular company. Past performance is not indicative of future results. A list of investments made by Contrary (excluding investments for which the issuer has not provided permission for Contrary to disclose publicly, Fund of Fund investments and investments in which total invested capital is no more than $50,000) is available at www.contrary.com/investments.

Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by Contrary. While taken from sources believed to be reliable, Contrary has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Please see www.contrary.com/legal for additional important information.

Authors

Sachin Maini

Editor

See articles

© 2024 Contrary Research · All rights reserved

Privacy Policy