Monte Carlo

Thesis

The complexity of data infrastructure has grown alongside the adoption of cloud systems, driven by the enhanced scalability and performance in data processing and storage. As of 2023, 67% of enterprise infrastructure was cloud-based. The performance of compute systems per dollar invested has doubled every 2.5 years between 2006 and 2021. Storage expenses have experienced an 800x reduction from 2000 to 2023. As a result, the volume of data processed by businesses has increased and is expected to swell to 612 zettabytes by 2030. This transformation underscores the critical role of data in driving business decisions and processes.

However, data used by companies is not always accurate. As of 2019, one in five companies had lost customers due to erroneous data. Data and AI teams spent twice the amount of time on data downtime year-over-year in 2023. Poor data quality costs organizations, on average, an annual loss of $12.9 million as of 2021. For example, in 2022 Unity Software reported a loss of $110 million in revenue due to “ingesting bad data from a large customer.” As a result, data employees have been spending half their time fixing data issues for years. A new wave of data monitoring tools has emerged to help users trust their data quality and reliability.

Monte Carlo is a platform for end-to-end data observability – a term coined by its CEO – to help organizations monitor abnormal patterns in their data. Data observability refers to an organization’s ability to understand its data by monitoring its volume and quality as it moves through data pipelines. Similar to how observability helps DevOps teams monitor system health, data observability does the same for DataOps teams. It helps them automate monitoring, alerts, and issue handling to keep track of data health. This method flags inaccurate data before it flows downstream into systems like data warehouses and AI models.

Founding Story

Bar Moses (CEO) founded Monte Carlo in 2019 with Lior Gavish (CTO).

Moses met Gavish at Stanford, where she studied computational science while he studied for his MBA in 2010. Before college, she worked in data analysis in the Israeli Air Force, which inspired an early interest in automating data tasks. After Stanford, she worked at Bain Consulting and later joined GainSight, where she became VP of Customer Operations. During her tenure, she saw how inaccurate data would negatively impact the trust of key stakeholders and customers. In 2012 Gavish co-founded Sookasa, a cloud security provider. Sookasa was later acquired by Barracuda in 2016, and Gavish worked at Barracuda as SVP of Engineering on ML products for fraud protection.

From her time at GainSight, Moses observed how her engineering counterparts had APM products, while data teams lacked equivalent tools to validate their work. Moses left GainSight in 2018 to start her own company and worked on three different company ideas. After talking to hundreds of data leaders about their pain points with data quality, she settled on Monte Carlo. From her experience working on data dashboards at GainSight, she dealt with challenges around data reliability: ensuring data accuracy and pinpointing the problem to figure out the path of resolution. Her user research with potential customers reinforced the need for a solution to data quality issues across an organization.

Moses and Gavish first focused on data problems closest to the end user, such as looking at reports, interpreting the output of models, or viewing a website. Another early problem they prioritized was tracking schema changes for data analysts. They talked with data leaders who identified data downtime and data quality issues as one of their top three pain points.

Product

Monte Carlo is a platform for data teams to ensure the reliability and accuracy of their data pipelines through continuous monitoring and testing. The product provides out-of-box coverage for a customer’s data stack, integrating end-to-end across cloud warehouses, lakes, ETL, and business intelligence tools. Monte Carlo’s system is API-based and connects to key data systems to collect metadata and statistics. Then, it reconstructs the data lineage to track the flow of data over time. Customers leverage field-level lineage to track how upstream changes in one system affect downstream dependencies in another.

Monte Carlo defines data observability with five main pillars:

Data Freshness: Whether the data arrived at an expected time
Data Volume: Whether tables are the right size
Data Schema: Whether the schema changed
Data Quality: Whether the values are outside of a known range
Data Lineage: Which upstream sources and downstream consumers were impacted when the data broke

Monte Carlo’s platform addresses each of these data observability components with key features including modules to monitor and understand (1) data assets, (2) data alerts & incident management, (3) data monitoring, (4) dashboards for broad data overviews, and (5) root cause analysis to drill down into specific issues for better understanding.

Tags

Reading Time

Reading Time

Thesis

Founding Story

Product

Assets

Alerts

Monitoring

Dashboards

Market

Customer

Market Size

Competition

Business Model

Traction

Valuation

Key Opportunities

Data Quality for Generative AI

AI Observability

Breaking Down Data Silos

Key Risks

ROI For Observability Tools

Fragmented Market

Data Security

Summary

Snowflake

Databricks

dbt Labs

Redpanda

Astronomer

Monte Carlo

Tags

Reading Time

Reading Time

Thesis

Founding Story

Product

Assets

Alerts

Monitoring

Dashboards

Market

Customer

Market Size

Competition

Business Model

Traction

Valuation

Key Opportunities

Data Quality for Generative AI

AI Observability

Breaking Down Data Silos

Key Risks

ROI For Observability Tools

Fragmented Market

Data Security

Summary

Snowflake

Databricks

dbt Labs

Redpanda

Astronomer