Thesis
There are over 200 million content creators in the world as of 2023. 58% of creators produce 2-4 types of content, such as long-form videos and podcasts, and there were more than 5 million podcasts in 2022. The US has 100 million active podcast listeners as of January 2023. In the US, podcast advertising revenue was $1.4 billion in 2021, surpassing the $1 billion mark for the first time, representing a 72% increase over the prior year. Moreover, podcast revenue is projected to increase to $4 billion by 2024.
Meanwhile, the video content market is also continuing to grow. In 2023, 3.6 billion people globally are expected to watch digital video content. On average, people watch 17 hours of online videos per week. YouTube alone has ~2.7 billion active users as of April 2023 and over a billion hours of YouTube content is consumed daily. The democratization of content creation tools has made it easier for creators to produce high-quality content and reach their audiences directly. Creators are aware of how better tools benefit them: 74% of podcast creators believe AI can enhance their content by enabling more efficient workflows. But although capturing and sharing podcasts and videos has never been easier and production has never been more prolific, editing remains difficult, slow, and non-collaborative.
Descript offers tools to record, edit, transcribe, collaborate, and share videos and podcasts. It started as an audio editing tool that took recorded audio and gave the user a transcript that they could easily edit. Deleting a word from the text transcript would remove that same word from the audio file. Descript later expanded into video editing. The company believes the division between audio and video editing tools should become obsolete because content creators don't neatly fall within the traditional boundaries between content mediums. For instance, YouTubers make podcasts, and podcasters distribute their content on YouTube, signaling the need for a comprehensive tool like Descript that makes both audio and video editing simpler.
Founding Story
Descript was founded in 2017 by serial entrepreneur Andrew Mason (CEO). Mason, a Northwestern University graduate, has a longstanding history in the startup world. In 2008, he founded Groupon, which, at the time, quickly became one of the fastest-growing startups in history. Groupon went public in a high-profile IPO, and Mason was CEO of the company until 2013. He then co-founded Detour, an augmented reality, location-based audio walk app acquired by Bose in 2017. Mason also spent time as a part-time Partner at Y Combinator until 2015.
At Detour, Mason sought to enable users to record and edit audio for walking tours. However, the widely-used audio editing programs available at the time were originally made for music production, and the technical proficiency required to use them was substantial. Looking at the available options, Mason realized that text-based audio editing would be more efficient and quicker than manipulating waveforms. After conducting research, the Detour team found that automated transcription and text/audio alignment was a rapidly advancing area of technology, and their concept had become feasible. They worked on that audio production tool for two and half years at Detour before spinning it off as a separate startup called Descript.
Product
Initially, Descript focused on audio content. The platform’s core capability was providing a document-style interface for professionally enhancing and editing audio recordings by interacting with auto-generated transcripts. Through the acquisition of Lyrebird in 2019, Descript implemented Overdub, allowing users to clone their voices and synthetically generate audio by adding or replacing words in the transcript. In 2020, Descript extended the offering to support video with tools to take screen recordings and import, edit, and publish video content. The company launched a revamped video editor (Descript’s ”Storyboard”) in 2022. Descript allows users to split video content into ‘scenes’ using a backslash (”/”) in the transcript and then replace and edit the content piecemeal.
Capture
Descript offers the following tools for creating, capturing and recording multi-form content:
Remote recording: For podcast recordings and interviews, Descript offers integrations to Zoom, Skype, and other conferencing tools. It also has high-fidelity audio, live multi-track transcription, filler word removal, audio editors, and other capabilities for collaboration and publishing.
Screen recording: Users can edit and share screen recordings with a custom webcam. Users can choose the size of their webcam and capture any part of the screen they want in high-quality resolution.
Transcription: Transcription is the core capability underpinning the Descript platform. Users can generate and edit content by adding and removing words and sentences. Descript’s transcription product offers automatic speaker labels, multi-language translation support for 22 languages, cloud sync, and human-assisted transcription for professional jobs @ $2/min.
Source: Descript
Edit
Podcasting and Video Editing: Descript offers a full workflow for recording and publishing podcasts and multi-modal content. This includes multi-track recording, transcription, collaboration, media promotion (integrations to social networks), sharing and publishing (integrations with hosting providers), and video-specific editing capabilities like adding titles and captions, transitions, keyframe animations, and visual elements.
Source: Descript
Storyboard: Users can use the “/” key to break transcripts into “scenes” and arrange and edit video content in a slide-like format.
Source: Descript
Multi-track product engine: Descript’s recorder is integrated into the editor, with separate tracks for the screen and camera. Users can record tracks simultaneously and add intros, outros, music, and effects. Each track is transcribed separately and combined into a single transcript with dynamically inserted speaker labels.
Source: Descript
Other features of Descript’s editing offering include:
Filler word removal: Users can remove ‘ums’, ‘uhs’, ‘you knows’, and other filler words by editing the transcript.
Overdub (text-to-speech): Descript can create a voice model of the user and add synthetic voice to content (text-to-speech) by editing the transcript. It has professional voice blending and multiple voices for different contexts.
Studio quality sound: Descript supports professional audio quality without requiring expensive hardware. It has noise removal, speech enhancement, acoustic echo cancellation, and sound effects.
Share
Team collaboration: Descript offers the following features to support team collaboration: a recorder, embeddable video player, real-time editing, brand-specific templates, stock media (images, videos, GIFs, effects, music), and cloud sync.
Source: Descript
Publishing: Descript creates shareable pages for audio and video with a click to make it easier to share content once produced.
Source: Descript
Social video: Users can use Descript’s transcription product to clip video and customize it to their brand style with images, captions, waveforms, animations, or progress bars, and share to social media channels.
Market
Customer
Descript’s target audience is any individuals or organizations that need to edit and publish high-quality video and audio content. That includes social media influencers, podcasters, vloggers, authors, journalists, publishers, corporate marketing, sales, customer support, and internal communications. Notable customers include NPR, VICE, The Washington Post, The New York Times, Shopify, Hubspot, and Masterclass. It also works with many YouTube and TikTok channels.
Market Size
Descript is the beneficiary of several key trends: the democratization of audio and video content creation, the growth of the creator economy, and the corporate workflow migrating from the meeting room to the cloud. The global video editing software market was valued at $2 billion in 2021 and is projected to reach $3.3 billion by 2030, registering a CAGR of 5.6%.
There are over 200 million content creators in the world as of 2023. 58% of creators produce 2-4 types of content, such as long-form videos and podcasts. There are more than 5 million podcasts as of 2022. The US has 100 million active listeners as of January 2023.
Competition
Source: Google Trends
Audio Editing
Podcastle: Founded in 2020, Podcastle is an AI-powered audio content creation platform that helps podcasters, bloggers, journalists, marketers, educators, and other content creators to convert their text content to audio and edit audio content. It has transcription capabilities, an AI-powered sound quality tool called Magic Dust, and AI voices. It has raised $8.8 million in total funding.
Riverside.fm: Founded in 2019, Riverside.fm offers an audio and video recording platform allowing podcasters and media companies to record studio-quality remote interviews. Like Descript, it offers transcriptions, text-based audio and video editing, and social media sharing features. It has raised $47 million in total funding. Notable customers include Marvel, the New York Times, and TED.
Adobe Podcast: Adobe offers AI-powered audio recording and editing on the web. Adobe Premiere Pro’s speech-to-text technology powers it. Its mic-check feature gives the user an insight into whether they are too close to the microphone. The Enhanced Speech feature removes all unnecessary background noises.
Video Editing
Reduct: Founded in 2018, Reduct is a collaborative transcript-based video platform which allows users to review, search, highlight, and edit videos. It has raised $4 million in total funding. It is used across law, marketing, and film production.
InVideo: Founded in 2017, Invade is a video editing platform. It has millions of users globally and offers 5K professionally made templates. Its text-to-video editor converts scripts, articles, or blogs into videos. It has raised $52 million in total funding.
Pictory: Founded in 2019, Pictory is an AI-powered video generator and editor that can create videos from scripts and blog posts. Users can also edit existing videos with a Descript-style text-based editor. It has raised $4.7 million.
VEED.io: Founded in 2018, VEED is a web-based video editor that can do screen recording, automatic subtitles, AI voiceovers, image generation, and transcriptions. It has raised $35 million.
Business Model
Descript operates a seat-based subscription model with a usage-based component tied to monthly transcription hours per editor. The company counts a “seat” as an editor and allows an unlimited number of “basic” seats irrespective of plan choice.
The free plan limits the available transcription time, number of screen recordings, resolution output, number of projects, and the extensibility of AI features like Overdub, AI-Green Screen, and Studio Sound. The “Pro” account, priced at $288 per year, removes all limitations but the monthly transcription time. Descript does offer an enterprise plan with custom pricing bundled with a dedicated account representative, single sign-on, service agreements for cloned voices, and onboarding and training for more complex platform features.
Source: Descript
Traction
Descript has not shared much explicit information on user adoption or the business scale. Descript counted NPR, VICE, The Washington Post, and The New York Times among its customers as of 2021. Descript’s founder has said that Descript has expanded to “major universities and nonprofits,” as well as organizations in the public sector as of November 2022.
Valuation
OpenAI led Descript’s $50 million Series C at a $550 million valuation in 2022, with participation from a16z, Spark, Redpoint, and Daniel Gross. Other notable individual investors include Casey Neistat, Tobi Lutke (Shopify), Shishir Mehrotra (Coda), Lenny Rachitsky, Naval Ravikant, and Rahul Vohra (Superhuman).
Key Opportunities
Generative Audio and Video
Descript’s value proposition is to reduce friction in content creation, editing, and publishing. It has an opportunity to leverage generative models and automate creative workflows to, for example, write entire podcast segments and supplement recordings with generated audio and synthetic media (avatars, video clips, etc.). Descript’s partnership with OpenAI positions the company to experiment with novel ways to use generative AI for popular media consumption formats.
Live Content
63% of people ages 18-34 watch live streaming content regularly. Descript offers integrations with tools like ReStream, Twitch Studio, Youtube Live, and Wirecast. As Descript continues to capture more of its users’ time editing and publishing, there’s an opportunity to move into real-time audio and video content editing. Descript could explore the development of features that allow creators to enhance and edit live content on the fly, like integrated Q&A, live chats, surveys, polls, instant audience feedback, translation support in Overdub, and real-time special effects (e.g. AR lenses).
Vertical Integration
Descript’s core audience (video and podcast creators) strives to publish high-quality content that resonates, grows, retains its audience, and is capable of being monetized. Descript can identify adjacent pain points and build in-house capabilities like podcast and video recording tools or distribution arms for creators to build, grow, and manage audience reach and brand relationships.
Key Risks
Commodification
Several of the Descript platform’s core capabilities like transcription (automatic speech recognition), Overdub (text-to-speech), filler word removal, and AI green screen are increasingly becoming table stakes due to reduced friction to adding such features and because of AI advancements. As a result, it could become harder to differentiate the product’s visual UX. All that said, Descript has been among the first to incorporate AI into the platform since its inception.
Privacy
Transcription and voice cloning services will always likely face the issue of managing perceptions related to security and privacy. Descript is aware of the risk and has committed to not look at users’ data unless explicit user opt-in is provided. Descript does use a handful of services to process transcripts like Google Cloud speech-to-text to provide automatic transcription. Google deletes data from servers after transcription is completed.
Customer Churn
ConvertKit’s 2022 survey noted 61% of creators experienced burnout in 2021 due to prolonged stress and mental exhaustion resulting from the need to create and publish content consistently. Additionally, 60% of creators earn less than $50K and 30% earn less than 10K. These dynamics could cause a spiral of volatility in the Descript customer base's solo creator and small team segments. Descript can mitigate this risk by continuing to move upmarket by serving content studios and enterprises.
Summary
The obstacles for creating professional quality media continues to diminish, democratizing production and expanding the size of the addressable market for enabling technologies like Descript. In offering an easier way to capture, edit, publish, and share video and audio, Descript provides a valuable service for non-technical creators and for businesses looking to create and distribute media rapidly. Descript faces competition both from startups and from entrenched creative platforms like Adobe, all aiming to capture the next set of creators by providing the best editing experience possible. As platforms continue to look for the next point of leverage for differentiation, Descript’s OpenAI partnership could prove valuable in enabling knowledge-sharing and access to AI expertise, ensuring Descript can work to remain ahead of the curve.