Thesis
Creators and advertisers are constantly in need of image content, and supply has grown to meet that demand; the volume of image content has steadily climbed for more than a decade. The number of photos shared over social media has tripled since 2013, with 14.1 billion images shared per day across WhatsApp, Snapchat, Facebook, and Instagram as of October 2023. Even on video-first platforms like YouTube and TikTok, creators and advertisers rely on thumbnail images to get users to click on the video.
Stock photography is one commonly used solution for creators and advertisers looking for images to help attract traffic or views. The stock photography market was valued at $3.3 billion in 2020. Free stock image sites host millions of photos — for example, Unsplash hosted 3 million photos as of October 2023. Meanwhile, Shutterstock, one of the largest paid stock photography sites, hosted over 1 billion images as of October 2023. With the average cost of a stock photo reaching $1.11 as of April 2023 and the risk of the same photos being overused, generative AI for commercial use becomes increasingly attractive.
Not everyone can take breathtaking photos or draw photorealistic art, but anyone can type an idea for an image that AI can then generate. In its early days, this ability quickly attracted the interest of social media users; for example, Hank Green’s “cat” thread on Twitter attracted over 100K likes. Outside of social media, “AI art generator” received an average of 864K searches per month globally from August 2022 to August 2023, while “stock photo” only achieved an average of 94K searches per month globally in the same period. Generative image AI is already being used to quickly produce concept art, make movie trailers, and even win art competitions.
Midjourney describes itself as an “independent research lab” known for producing a generative AI program that creates images based on text prompts. As the demand for online imagery continues to increase, Midjourney could offer a cheaper and more customized alternative to stock photography and reduce time spent on the brainstorming phase for contractors, game designers, and other artists alike. As of October 2023, Midjourney is a self-funded team with 11 full-time employees and is an open beta which it entered into in July 2022.
Founding Story
Midjourney was founded in 2022 by David Holz (CEO). Prior to founding Midjourney, Holz described himself as having been a serial entrepreneur since high school, when he ran a design business while still a student. He studied physics and math in college and before going on to pursue a PhD in applied math while working simultaneously at NASA and the Max Planck Institute from 2009 through 2011.
Holz eventually became overwhelmed and took a leave of absence from his Ph.D. program to move to San Francisco and start his first startup, Leap Motion, around 2011. The company’s hardware was utilized by over 300K developers during its twelve years of operating under Holz’s leadership. In May 2019, the company was acquired by Ultrahaptics, now Ultraleap, for roughly $30 million or 10% of Leap Motion’s $306 million peak valuation in 2013.
In August 2021, Holz left Leap Motion to start Midjourney, seeking a change from large venture-backed companies. After recruiting a team of ten engineers, the independent lab launched its first private demo in September 2021. As Holz watched the first users test Midjourney, he began to realize the tool’s potential, wondering:
“What does it mean when computers are better at visual imagination than 99 percent of humans?”
By February 2022, Midjourney was released publicly in the form of a Discord bot within the Midjourney community. Hosting the product in Discord gave people the space to imagine together while also giving newcomers examples to explore. As Holz described in a November 2022 conversation with Stratechery’s Ben Thompson:
“I think that the Midjourney experience would not work at all if it was just talking to a chatbot in [a] room by yourself, but the second that it’s in a room with lots of people, it becomes really interesting.”
By August 2022, the Midjourney community had roughly one million members who Holz described as a “hive mind of people, super-powered with technology”.
Product
How Generative AI Works
Training Data
Training data plays a vital role in generative AI. Generative models need examples of objects like cars and cats before they are capable of generating new images of cats driving cars. As a result, Midjourney turned to the largest database of images it could find — the internet. Holz described Midjourney’s training data collection process as “a big scrape of the internet”.
Once images to be used as training data are collected, the data must be cleaned (removing duplicate, irrelevant, or bad data) and labeled. Although Midjourney performed its own “big scrape” to collect its own training data, it also utilized open-source training models such as the 2 billion image-text pairs from the English subset of CLIP’s open dataset created by the German non-profit LAION.
Diffusion Models
Since the release of Midjourney V4 in August 2022, Midjourney has been using a diffusion-based generative AI model. Diffusion models work by gradually adding noise to a sample image and then learning how to reverse that process, allowing them to create new images that resemble the original ones. After receiving a sample or training image, diffusion models go through a step-by-step process described below:
Forward Diffusion Process: The diffusion model starts by taking an input image and gradually adding Gaussian noise.
Noise Accumulation: The model continues to add more noise to the image. After each addition, randomized image sections are covered in noise until the original image is transformed into a noisy or grain-covered version. More noise will result in a more different generation, while less will produce a more similar generation to the original image.
Denoising Process: After adding the desired amount of noise, which Midjourney users can partially control with prompt weighting, the model learns to recover the original image by reversing the noising process.
Iterative Refinement: Denoising is performed iteratively, gradually reducing the noise level in the image. At each step, the diffusion model improves the image’s quality and ability to refine over time.
Training and Predictive Learning: The steps above repeat for as many images in the training dataset as possible. The model eventually learns to predict the original image from the noisy image.
Generating New Data: Once the model is trained, it creates new images by passing random noise samples and generating the colors and shapes from the patterns the model picked up during training. This creates unique images similar to the training data but slightly different, resulting in various possible outputs.
If a diffusion model is given enough sample images of objects such as cats or cars, the model eventually learns what noise patterns shape a cat's face or car's wheel. Once prompted, the diffusion model will combine the necessary noise patterns and generate an image of a cat driving a car. Since the noise patterns can be “denoised” or read differently, the same prompt naturally creates similar variations.
Source: Stable Diffusion Art
Using Midjourney
Users can join Midjourney’s public beta on Discord to start generating images. The user can begin prompting Midjourney by typing /imagine in Discord. Each prompt generates four similar variations. The process can take anywhere between a few seconds and many minutes, based on popularity, subscription plan, and rate limits.
After this, users can upscale one of the four results or re-prompt the program. Midjourney’s detailed documentation also walks users through more advanced prompting to control aspect ratio, styles, and image editing post-generation. Until March 2023, when Midjourney closed its free trial program to reduce server outages and misuse, the first 25 image generations were free.
Source: Midjourney
Other Capabilities
In addition to letting users generate images based on text prompts, Midjourney offers a “describe” command as well as a “blend” command. “describe” lets users upload an image. Instead of returning four images in response, Midjourney returns four incredibly detailed prompts, often supplying sample ratios, styles, and other advanced prompt parameters. With the “blend” functionality, users can upload two to five images and blend them together, letting creators see unique combinations of any images they have in their camera roll.
Source: Absolutely AI
Image Editing
Beyond standard variation options after each generation, Midjourney’s “remix” and “zoom out” commands let users edit images. Midjourney’s “remix” command will reuse the original generated image as a base and then alter the image based on the user’s new prompt. If a user struggles to get the desired result the first time, they have specific control the second time. In August 2023, Midjourney also introduced an “inpainting” feature which allows subscribers to modify sections of an image that it has generated without starting over.
Source: Bootcamp
Market
Customer
In September 2022, Holz estimated that only 30% to 50% of Midjourney’s users were using the tool as professionals, with the rest using it in a personal capacity. In March 2023 the company stopped offering free trials, which may have reduced the number of casual users. As of August 2023, users must subscribe to Midjourney to generate their own images, but people can join the community and export other people’s creations without subscribing.
51% of Midjourney’s traffic is direct while 39% is organic, demonstrating that the company is benefiting from unpaid customer acquisition; this helped the company achieve profitably just six months after launch. Although the company has not released any updated information about its customer base as of October 2023, Midjourney’s customers are likely to fall into two buckets based on its use cases: advertisers and artists.
Advertisers: Instead of paying for generic stock imagery, which can be particularly limiting to companies in niche markets, advertisers could turn to Midjourney to generate customized images that can easily be revised without paying for a stock photo license or photographer. Midjourney’s aspect ratio remix and inpainting options may be particularly helpful for teams who seek to reuse content across different social platforms and devices by making modifications or edits.
Source: Contrary Research on Midjourney
Artists: Although some believe AI art exists in competition with artists, Holz said that artists use Midjourney to be more “explorative in the beginning, coming up with lot a of ideas in a short amount of time.” can help artists. Game designers and concept artists could also use Midjourney creations as a base before jumping into modeling and rigging. In addition to helping artists settle on an idea before spending hours drawing, molding, or photographing, others have used Midjourney as an input to their work. A graphic designer could generate background textures while a photographer could generate new skies.
Before the advent of generative AI tools like Midjourney, artists relied on sites like Pinterest, Dribble, or stock photo sites to gain inspiration. Although such methods may give artists all the pieces, only generative AI has the potential to help artists combine pieces during the inspiration phase. Artist adoption varies from creator to creator, with some suing against AI art and others embracing it, but Midjourney's value proposition is likely to be clear to creators or artists who are early adopters.
Market Size
Overall, the generative AI is both large and fast-growing. It was valued at $10.5 billion in 2022 and is projected to grow at a 34.1% CAGR until 2032 when it will potentially reach $191.8 billion. Within this, the generative AI art market in particular was valued at $212 million in 2022 and was projected to be growing at an even faster rate than generative AI in general at a 40.5% CAGR. If that trend holds, generative AI art will reach a market size of $5.8 billion by 2032.
The addressable market of generative AI for images may also be compared to the stock photography industry since it may absorb some or even most of this market. Before the release of mainstream AI image services, the global stock photography market was valued at $3.3 billion in 2020 and was expected to reach $4.8 billion by 2028. Across the many use cases for generative image AI, there are millions of potential customers. In 2020, there were 32 million bloggers in the United States representing a large audience of individual creators who may consistently need cover art. Although statistics on total social advertising agencies are limited, Facebook Data reported seven million active advertising partners in February 2019.
Competition
OpenAI
Founded in 2015, OpenAI develops foundational AI models as infrastructure for AI-driven applications. OpenAI’s core products include developer APIs for GPT, text generation, and DALL-E, image generations, and its flagship consumer product ChatGPT. Originally founded as a non-profit, OpenAI received $1 billion in donations from angel investors including Sam Altman, Greg Brockman, Elon Musk, and Peter Thiel as well as funds including Amazon Web Services, Infosys, and YC Research. OpenAI converted to a “capped-profit” company in March 2019. In October 2023, OpenAI was reported to be in talks for a deal that would value the company at up to $80 billion, which would make it the third most valuable startup in the world.
OpenAI’s breakout product was ChatGPT, which reached 100 million users in two months after its launch. Even though consumers largely associate OpenAI with ChatGPT because of this, OpenAI’s generative image model, DALL-E 2, is a powerful AI image generator in its own right, and perhaps Midjourney’s biggest competitor. By comparison to Midjourney, DALL-E 2’s complete web app UI may be easier for consumers who are not already on Discord. Although prompting quality is less reliable when compared to Midjourney’s v5, DALL-E 2 is accessible through an API, unlike Midjourney. OpenAI’s significant funding, attention from its other AI models, and off-platform use make it a formidable competitor.
Runway
Founded in 2018, Runway offers collaborative video and image editing software. In addition to real-time editing, Runway Research's AI tools offer powerful prompting, editing, and image-to-video software. As of October 2023, Runway has raised $236.5 million in funding. Its most recent round was a $141 million Series C extension which valued the company at $1.5 billion.
Runway offers a free trial with limited functionality, and its cheapest subscription plan starts at $15/month per user. Even though Midjourney users can collaborate via Discord, Runway's real-time collaboration is available throughout the full design process. Runway focuses on professional and enterprise use, while Midjourney remains more targeted towards individuals. Even though Runway offers its own AI image generation, some users still use Midjourney for image generation, for example combining both tools to create a movie trailer.
Stable Diffusion
Stable Diffusion was released publicly in August 2022; its development was reportedly funded by Stability AI, a startup that was founded in 2019. Stability AI has raised a total of $123.8 million in funding as of October 2023, with its latest round of funding, $25 million, having been raised via convertible note in June 2023. This occurred after the startup reportedly failed to raise equity funding at a $4 billion valuation and after the publication of a Forbes article that asserted that Stability AI founder Emad Mostaque had exaggerated Stability AI’s role in Stable Diffusion’s origin and that its source code was written by a different group of researchers.
Although diffusion-based image models have existed since 2015, Stable Diffusion remains a popular alternative to both DALL-E 2 and Midjourney. Stable Diffusion is an open-source model, which means that anyone can download and use it for free. It is also more customizable than Midjourney, with a wider range of features and settings. For example, Stable Diffusion supports text-to-image, image-to-image, inpainting, outpainting, and editing. Midjourney only supports text-to-image, inpainting, and editing. Midjourney, however, is designed to be easier to use.
Adobe
Founded in 1982, Adobe has developed over 60 software applications in its Creative and Acrobat Suite to power the daily workflows of artists and enterprise customers. As of October 2023, Adobe has a market cap of $237.3 billion. Although AI has powered Adobe tools like Content-Aware Fill since 2019, Adobe officially joined the AI race with the release of Firefly in March 2023. Firefly, a generative AI tool, was added to the Creative Cloud suite, giving users the ability to generate images, vectors, videos, and even 3D.
Adobe's Firefly is designed to incorporate AI into the established workflows of users who are already accustomed to using Adobe’s product suite. This means that Adobe users can leverage Firefly with familiar tools like the “Pen” tool in Photoshop, enabling users to define specific areas for AI-driven editing. Unlike Firefly with its integration across various Adobe applications, Midjourney’s primary purpose is to generate new images rather than precisely edit or modify existing ones, although it has broadened its functionality to include editing since it launched. Since Firefly is wrapped into Creative Cloud pricing, the $54/mo price tag may make Midjourney a more affordable alternative for individual creators or designers not looking for functionality beyond basic image generation and editing.
Shutterstock
Founded in 2003, Shutterstock is a stock media site, offering photos, videos, audio, graphic design templates, and 3D assets as well as some image editing and media planning tools. In May 2023, Shutterstock released its own AI image generator and revealed a waitlist for a smart design assistant. Shutterstock offers more media types compared to Midjourney, which only offers images. Shutterstock had a market cap of ~$1.3 billion as of October 2023.
Although Midjourney has a general advantage over stock images in terms of user control and flexibility, Shutterstock’s new generative AI tool and AI design assistant may satisfy existing Shutterstock users; it even be able to leverage a distribution advantage with its vast SEO reach. It’s easy to find a Shutterstock image after a quick Google Search, while by contrast, a new Midjourney generation is much more difficult to discover. Midjourney’s lack of off-platform access may also put it at a disadvantage compared to Shutterstock’s web-based generator.
Business Model
Midjourney primarily generates revenue through its tiered subscription models, offering different plans ranging from $10/month for the most basic plan to $120/month for the highest tier, and it encompasses various features such as Fast GPU Time, concurrent jobs, and privacy control.
All plans offer unlimited commercially licensed generations depending on processing availability, but they differ in generation speed. While default "relaxed" generation times can vary based on usage, usually taking no longer than a few minutes, Fast GPU Time aims to achieve generation in under a minute or two. The distinction in Fast GPU Time, concurrency, and privacy control across different plans creates an incentive for users to opt for higher-tier subscriptions.
Source: Midjourney
Even as a software company using cloud services, Midjourney relies on an asset-heavy model given the significant investment in infrastructure required to support an actively training AI service used by millions of people. In addition to supporting its small team, Midjourney’s greatest expenses likely fall into three categories: data collection, data cleansing & training, and active server costs.
Data Collection (Web Scraping): Building and running an efficient web scraper can be expensive, with some services charging around $3.33/hour. Let’s say Midjourney scraped only a week’s worth of internet photos — roughly 20 billion photos — with each photo only taking 10 milliseconds to scrape. This would amount to 55K hours of scraping and at $3.33 per hour, cost around $185K just to collect one week’s worth of photos. This does not include paying for proxies to prevent IP blocking, as many sites auto-block mass web scraping nor server costs specifically for running the data collection process.
Data Cleansing & Training Neural Networks: Once data is collected, it still needs to be cleansed. Server costs for training a diffusion model with a large data set also need to be considered. Training a small-scale General Adversarial Network, or an older approach to generative AI images, on Google Cloud can cost between $2.5K and $3.1K per month for example. Stable Diffusion itself was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150K GPU hours, at a cost of $600K.
Active Server Costs: In an interview with the Verge in August 2022, Holz described the computing power required for generating images like this:
"Every image is taking petaops. So 1000s of trillions of operations. I don't know exactly whether it's five or 10 or 50. But it's 1000s of trillions of operations to make an image. It's probably the most expensive … if you call Midjourney, a service – like you'd call it a service or a product – without a doubt, there has never been a service before where a regular person is using this much compute.”
A petatop means 10^15 operations per second for every image generated. Let’s assume that generating one image requires five Petatops (5 * 10^15 operations). Modern GPUs can perform trillions of operations per second. For example, the NVIDIA Tesla V100, widely used for deep learning, offers about 15 TeraFLOPs for certain types of calculations.
Source: Contrary Research
Profitability: Even with looming server costs, Holz told the Verge that the company had become profitable in August 2022, just six months after its public launch.
Traction
By August 2022, the Midjourney community had roughly one million members. Midjourney was used to win an art competition in September 2022. As of October 2023, Midjourney’s community had reached 16 million members. This made it Discord's most popular by a significant margin at the time, with the next largest communities having been around a million members in size. Midjourney is entirely self-funded and operates with eleven full-time employees as of October 2023.
As far as revenue goes, if 5% of Midjourney’s 13 million server members are on the Basic $10/mo plan, 650K customers at $10/mo represents $6.5 million in MRR and $78 million in ARR. If Holz’s estimate of 30% to 50% of customers being “professionals” hints at Midjourney users with commercial licensing, or therefore any subscribed plan, then 30% of total server members on the Basic $10/mo plan would be 3.9 million customers, representing $39 million in MRR and $468 million in ARR. Meanwhile, others have estimated that Midjourney was earning ~$750K in monthly sales by May 2023.
Valuation
Midjourney has yet to accept any venture funding and may not plan to in the future. In August 2022, Holz explained the reasoning behind this approach:
"We're like a self-funded research lab. We can lose some amount of money. We don't have like $100 million of somebody else's money to lose. To be honest, we're already profitable, and we're fine.”
Key Opportunities
Expanding Functionalities
Although Midjourney remains in beta as of October 2023, it can continue to expand its functionalities. Midjourney added an “inpainting feature” as of August 2023, and can continue to expand functionalities, for example by offering specific export tools, or even vector generations, which could help Midjourney aid its customers throughout their design processes instead of just the beginning. Runway’s image-to-video tool, for example, has helped users create cinematic movie trailers, all without shooting a second of real video. Midjourney could likewise offer its own in-house image-to-video support or explore a direct partnership with Runway, such as syncing a user’s Midjourney gallery to their Runway folder assets.
API Access
Midjourney does not have an official API, and its subscribers are individuals, not developers or other startups. By releasing its model for developers and charging per use, just like Open AI’s DALL-E 2, which charges $0.016 to $0.020 per image, Midjourney could capture more customers without significant internal development strain. Offering APIs could potentially let Midjourney’s community handle core features, like image editing or even website-hosted versions, while still generating revenue for Midjourney.
Key Risks
Intense Competition Within Existing Workflows
With nearly 30 million customers using Adobe Creative Cloud, Midjourney could be viewed as just another tab or application to boot up during the design process. As Adobe Firefly expands early access, creators may find it more convenient to use the tools within their main design interfaces, instead of constantly switching back and forth. Midjourney’s existence within Discord and lack of off-platform use create friction for customers even if they do not use services from Adobe Creative Cloud. Searching “AI” in Figma’s community plugin center returns more than one hundred results already, as of August 2023, all offering quick, in-app AI solutions.
Regulatory Decisions
As writers and artists grapple with AI’s relationship with their work, many have begun to sue AI companies. Some lawsuits have also originated from stock imaging companies, like Getty Images, who sued Stability AI for the alleged use of 12 million images. As the industry awaits potential regulatory decisions, AI companies may be fined and encouraged to compensate artists and or adjust their web scraping models. Although some judges have dismissed claims against generative AI companies, the uncertainty poses potential short-term and long-term expenses, if not development changes.
Dependence on Discord
Midjourney's product is primarily accessed through a Discord interface. This dependence on a third-party platform could pose a risk if Discord changes its policies or terms of service, or if it experiences technical issues or outages. As Midjourney becomes more profitable, Discord may seek to gain revenue share in exchange for hosting so many members.
Summary
Midjourney rests in an incredibly competitive market, with other generative AI image products spawning from enterprise creative software companies with a robust suite of tools and millions in funding to seamlessly integrate into their already popular software. Midjourney grew to 16 million members as of October 2023, making it the largest community on Discord. The company’s high-quality visuals have impressed mainstream users. The high-quality image generations and fast growing membership allowed Midjourney to become profitable just six months after launch. Midjourney’s small team has promising progress so far, but will need to rapidly improve and expand its product offerings to stay ahead of larger, more robust, generative AI companies.