Thesis
AI adoption has reached an inflection point, with 78% of organizations using AI in at least one business function, up from 72% in early 2024 and 55% a year earlier. However, this rapid deployment has exposed fundamental problems: 47% of organizations in 2025 have experienced at least one negative consequence from generative AI use, and 74% of companies in 2024 have yet to show tangible value from their AI investments, stemming in part from the “black-box” nature of most AI models.
Such problems are not unprecedented. The genetics field faced a similar situation before 1953, when scientists could observe clear inheritance patterns and even develop mathematical laws, but the underlying mechanism remained a complete black box. The breakthrough in understanding DNA structure in 1953 by Watson and Crick didn't just solve safety concerns about genetic prediction. It birthed entirely new fields: molecular biology, genetic engineering, and synthetic biology. These fields have created massive economic value, with the global biotechnology market being valued at $1.5 trillion in 2023.
The 2025 AI landscape mirrors that pre-interpretability phase in genetics. The core issue driving both the negative consequences and the lack of tangible value is the black box problem plaguing modern AI systems. High-profile failures have demonstrated the financial risks of opaque AI systems: Google's Bard chatbot error in early 2023 wiped out over $100 billion in market value when it provided incorrect information, while Character AI faces wrongful death lawsuits after its chatbot allegedly encouraged a 14-year-old's suicide in late 2024.
Unfortunately, even leading researchers have little idea of how neural networks truly function. AI models are grown, not coded, exhibiting emergent behavior that defies traditional programming logic. Beyond safety concerns, this opacity prevents us from understanding not just what can go wrong, but what new forms of intelligence, reasoning, and problem-solving might be possible.
Goodfire, founded in 2024, has emerged as the leading AI interpretability research company. The company's differentiation lies in mechanistic interpretability by decoding the neurons inside an AI model to understand its internal thoughts. Just as genetic engineering was inconceivable before understanding DNA, Goodfire may position itself at the discovery of new paradigms of science, reasoning, and human capability.
Founding Story
Goodfire was founded in June 2024 by Eric Ho (CEO), Dan Balsam (CTO), and Tom McGrath (Chief Scientist). Ho and Balsam began working together as CEO and founding engineer, respectively, at RippleMatch, a venture they formed in 2016 to reimagine the future of work with AI.
Ho, frustrated with the broken, unfair, and inefficient hiring system after his personal experience of job searching during his senior year at Yale, founded RippleMatch in 2016 and spent the next seven years increasing job accessibility with AI. Eventually, he was named to Forbes's 30 under 30 list for his work in 2022.
In 2023, both Ho and Balsam left RippleMatch, united by their conviction that mitigating AI risk was the most critical challenge to tackle. Balsam described himself as a “serial early employee at startups” on a podcast with Cognitive Revolution. In 2023, he turned his focus to the accelerating rate of progress in AI, becoming the first to integrate AI into RippleMatch products while realizing that the black box nature of AI made it hard to engineer with them.
After taking a work hiatus to catch up on AI research, Balsam approached Ho about how startups offered an uncharted terrain to expand AI research, following which Ho posted an article on LessWrong, a community of rationalist thinkers and effective altruists concerned with existential global problems, in December 2023. He articulated his belief that “for-profit organizations could potentially scale far more quickly than a non-profit,” resulting in “far more organizations working on safety in the ecosystem tapping into a different pool of resources,” which would be important since “startups that tackled short-term safety concerns would inadvertently position themselves to tackle long-term risks over time”.
McGrath, a long-time leader of the AI safety space, connected with Ho and Balsam after he had written a document that mirrored many of their beliefs. While completing his PhD in 2016, he speculated that AI was going to become a big deal — a view he claims as one of the better bets he’s made in his career — and went on to co-found the Interpretability team at DeepMind. In March 2024, he left Google to join South Park Commons, a community for technologists to help them figure out what to work on next. He set out to make interpretability “useful” by starting a company and has since been providing his complementary research skillset to engineer-entrepreneurs Ho and Balsam.
Product
Goodfire’s product is a set of tools designed to augment existing AI models to help developers and researchers better understand their models’ sensitivities, dependencies, errors, and biases. The company’s flagship product is Ember, an interpretability research tool, which has been equipped over time with additional research tooling and creative functionality to serve a wider breadth of customer use cases.
Ember
Goodfire launched Ember, the world’s first product for conducting mechanistic interpretability research, in December 2024. Ember is a hosted API/SDK that lets users shape AI model behavior by directly controlling a model’s internal units of computation, or “features,” to precisely control model outputs or use them as building blocks for tasks like classification.
In the status quo, computing power and efficient inference code are bottlenecks to extracting which neurons activate in a certain model, a process known as activation harvesting. Goodfire harvests activations from Llama 3.3 70B and uses the data to train Sparse Auto Encoders (SAEs). This artificial neural network method emerged in late 2022 as the first step toward understanding the black box of Artificial Intelligence.
These SAEs enable Ember to identify features that can capture how a model processes information, providing insight into its inner workings. They represent meaningful concepts from these interactions, like a model’s understanding of “conciseness” or “technical explanation.” This allows Goodfire to tell causal stories about why the models behave as they do.
Ember enables researchers to adjust a model's internal activations directly through a dynamic sidebar. This method provides precise control over how AI models think and respond—an increasingly difficult challenge as models scale.
There are three ways to find features a user may want to modify.
Auto Steer
Auto steering automatically finds and adjusts feature weights to achieve the desired behavior. By providing a short prompt, Autosteering finds relevant features, sets appropriate feature weights, and returns a FeatureEdits object that you can set directly.
Feature Search
Feature search helps users explore and discover what capabilities AI models have by browsing through available features. To find similar features, Feature Search compares the selected features to either individual features or groups of features. This helps users understand feature relationships beyond just their labels and reveals which features might work best for desired model adjustments.
Contrastive Search
Contrastive search lets users discover relevant features in a data-driven way. Given two datasets of chat examples, such as an example of behavior a user would like to avoid versus encourage, examples are paired such that the first example in dataset_1 contrasts with the first example in dataset_2, and so on. Combined with re-ranking, contrastive search sorts the features that distinguish between your datasets using a description of the desired behavior. This two-step process ensures features that are both mechanistically useful and aligned with the user's goals.
Ember Applications

Source: Goodfire
Once a user is familiar with Ember, it can be applied in a wide range of use cases.
Decision Trees
Instead of relying on resource-intensive approaches like training separate classifiers, e.g., feeding 1K news articles into a model for stock sentiment classification, Ember enables researchers to extract meaningful features directly from training data. Researchers can identify features that distinguish between positive and negative financial sentiment by comparing datasets where the assistant responds "good" vs "bad" to investor sentiment questions. It then finds the most relevant features for "bull market" and "bear market" contexts, for each news sample, then systematically grid searches through combinations of features to find the optimal feature set. Finally, it trains regularized decision trees on the feature activation values to classify sentiment, producing interpretable models where each decision node corresponds to a specific semantic feature threshold.
Knowledge Removal
To avoid generating potentially harmful content about real people while maintaining general conversational abilities, Ember can be used to selectively remove knowledge. In their celebrity-finding example, Ember searches for "celebrities" features, identifies specific features like "Celebrity names and fame-indicating context" and "Documentation of celebrity relationship status and history," then sets them to negative values. This creates variants that recognize celebrities such as Brad Pitt as "a very interesting character" and "a human who designs and creates things," but strip away biographical details, fame context, and personal information about the celebrity.
Dynamic Prompts
To adapt the AI models’ behavior and instructions based on the type of request without requiring explicit user specification, Ember can employ contrastive search to identify specific features. In their programming-related example, Ember finds features that distinguish programming requests, then uses feature inspections to detect when users are requesting code in real-time. When the "user is requesting code or programming examples" feature activates above an assigned threshold, the system dynamically switches from a basic "helpful assistant" prompt to a detailed coding prompt that instructs the model to write extensive, fully functional code with proper formatting and no placeholders.
Sorting by Features
Ember can filter or rank large amounts of text data based on semantic properties for content curation, moderation, and automated classification tasks. In their classifying-tone example, the code searches for sarcasm-related features and applies feature inspection to calculate sarcasm scores for each tweet by summing activations across multiple sarcasm features like "Casual discourse markers signaling sarcasm or irony" and "Dark humor or sarcasm about unpleasant situations." Tweets with combined sarcasm scores above 1.0 are filtered and collected, enabling content curation based on semantic feature patterns.
On-Demand RAG
To reduce latency and costs while ensuring relevant context is available for specialized topics, Ember can retrieve external information only when needed. By implementing conditional inference interruption to trigger external data retrieval mid-generation, Ember can, for example, set up an abort condition when the "References to the Coca-Cola brand" feature exceeds some threshold activation and then use exception handling to catch the aborted inference. When triggered, it simulates RAG retrieval, injects brand partnership context into the system prompt, and resumes generation with the partial tokens already generated plus the new contextual information, creating seamless on-demand knowledge augmentation.
Feature Steering
AI models cannot be edited or debugged without either tuning their prompts or curating training data to fine-tune the model. This can create unreliable or hard-to-trust software.
Feature represents a tangible form of giving reliable controls back to AI developers by providing them an expressive programming language of neurons that leverages the model's own intelligence.
By allowing fine-grained tuning of model outputs beyond general prompts such as “friendly and concise,” researchers can specify nuanced preferences such as “70% friendly, 50% concise, and 80% professional.”
This is one of the first breakthroughs in precisely quantitatively aligning qualitative human preferences with AI, a stark alternative to methods such as Reinforcement Learning with Human Feedback (RLHF), which often introduces unintended side effects, such as false refusals, where a model mistakenly refuses tasks it can perform.
Adversarial Robustness

Source: Haize Labs
Probing black-box AI systems for harmful, unexpected, and out-of-distribution behavior has historically been very difficult. Canonically, the only way to test models for unexpected behaviors, a process known as red-teaming, has been operating in the prompt domain using jailbreak prompts.
Traditional fine-tuning methods struggle to address jailbreaks without impacting performance. Using activation steering enabled by Ember, users can granularly and efficiently provoke unwanted behaviors. This facilitates not only the discovery of where harmful and refusal knowledge exists in a model, but can also help push a language model towards or away from those harmful and refusal regions.
Interpretability methods can also serve as a useful objective for traditional red-teaming algorithms, namely, prompt search and optimization. Instead of editing a model to be more harmful, interpretability tools can locate harmful features and use existing red-teaming methods to optimize for prompts that increase harmful behavior. This is primarily useful for red-teaming black-box models and APIs. In addition to this, Ember allows developers to steer features to enable models to be less harmful, for example, by increasing refusal features and decreasing harmful features.
Paint with Ember

Source: Goodfire
Released in May 2025, Paint with Ember is a tool for generating and editing images by directly manipulating the neural activations of AI models. This method relies on Goodfire's BatchTopK SAE architecture, which identifies the most important features active in each spatial region of a model's computation. For image models, this means understanding what visual concepts the AI recognizes in each 16x16 patch of an image.
The challenge with mechanistic interpretability has historically been scale and granularity. Early attempts at neural network interpretation focused on individual neurons that could only observe correlations, but modern networks contain billions of parameters with complex, distributed representations. Goodfire's approach enables direct causal intervention, meaning users can paint with individual neural features or higher-level concept clusters.
Goodfire enables this through a hierarchical approach combining features, or universal concepts the model has learned, such as "Bitcoin logo" or "top border positioning, with factors, or contextual clusterings of related features, creating human-scale abstractions like "galloping horse" or "background trees”.
Paint With Ember interprets Stable Diffusion XL-Turbo's internal representations and enables direct manipulation through a visual canvas interface. Decomposing each of the 256 spatial patches in SDXL-Turbo's intermediate representations into interpretable features, Paint with Ember identifies 10,240 distinct features, creating a 16x16 grid of concept activations that users can directly manipulate.
Market
Customer
Prior to 2024, Goodfire had entered partnerships with Rakuten, Apollo Research, and Haize Labs in which these companies use Ember to interpret proprietary models for model fine-tuning, improving scientific conclusions, and bolstering model security. Its most significant collaboration has been with the Arc Institute, a nonprofit research organization pioneering long-context biological foundation models. This partnership was announced publicly in February 2025, though when the partnership began has not been shared.
Intelligently composing new biological systems requires a deep understanding of the immense complexity encoded by genomes. Unlike language models that process readable text, humans struggle to directly read and understand DNA sequences due to their randomness, subtle differences, and limited annotations.
Researchers at Arc created Evo 2, a biological foundation model from a highly curated genomic atlas spanning all domains of life, that processes multiple layers of biological information: from raw DNA sequences to the proteins they encode and the intricate RNA structures they form.
The research team at Goodfire trained BatchTopK sparse autoencoders (SAEs) on layer 26 of Evo 2, learning a breadth of biological features such as exon-intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions without task-specific fine-tuning.

Source: Goodfire
To validate the relevance of these features, they created a visualizer that displays significant features corresponding to known genomic concepts. The majority of features achieved high F1 scores when tested on a comprehensive genome dataset.
In August 2025, Goodfire announced a partnership with Radical AI, a materials science research organization, furthering the company’s “commitment to interpretability as a tool for scientific discovery”, Goodfire said.
Market Size
The market surrounding AI interpretability and related areas is growing quickly, driven by both the proliferation of AI systems and escalating AI governance demands.
As an Explainable AI software, Goodfire represents a platform for AI explainability and transparency, enabling developers to audit fairness and decision tracing. The explainable AI market is on track to reach approximately $10 billion in 2025 and about $25 billion by 2030.
If governments, instead of corporations, want to ensure systems are auditable, fair, and compliant with regulations, Goodfire’s platform could be a core enabler for governance. In Europe, the EU AI Act entered into force in August 2024, with fines up to 20 million euros, mandating transparency for high-risk AI systems. 77% of companies in 2022 viewed AI compliance as a top priority, driving demand for interpretability tools. Goodfire’s approach also creates new possibilities that are not fully captured in traditional market segments.
Goodfire’s approach also creates new business opportunities that are not fully captured in traditional market segments. As Ember allows knowledge discovery from AI models, it can be used as a tool for unlocking novel value in scientific R&D. Having uncovered new DNA knowledge with Evo-2, Goodfire’s interpretability research could assist pharmaceutical companies in drug discovery, a market that is projected to reach approximately $8 to 10 billion by 2030. If Goodfire becomes part of the standard toolkit for scientific AI labs to find latent genomic features driving disease risk or unlock biologically meaningful patterns, this could represent an adjacent market.
Competition
Tilde: Tilde, founded in 2024, is an applied interpretability company that opens up the black-box nature of AI by giving users a window to understand and improve their models. Having raised $8M from Kholsa Ventures, Tilde’s agent intervenes across modalities of domains, from improving Llama 3’s problem-solving capabilities to enhancing video generation. One week before Goodfire announced Ember, Tile released Sieve: an end-to-end pipeline for applying SAEs for fine-grained control. While Goodfire has scaled up its SAE experiments into bigger models and is investigating non-industry use-cases, Tilde’s focus has shifted to building infrastructure that amortizes the costs of training interpretability models, such as activation data engines.
Reticular AI: Reticular, founded in 2024, is a specialized AI interpretability company focused on protein language models and structure prediction. Reticular uses Matryoshka Sparse Autoencoders for interpreting ESM2-3B, the base model for ESMFold, enabling researchers to understand how protein language models translate amino acid sequences into 3D structural predictions. Having built web-based visualization tools, they’ve demonstrated feature steering capabilities that control structural properties like solvent accessibility while maintaining input sequences. While Reticular specializes only in protein-level interpretability so far, Goodfire's collaboration with Arc Institute on Evo 2 operates at the fundamental DNA/nucleotide level, capturing more foundational biological patterns and demonstrating greater scalability by decoding sequences up to 1 million base pairs.
Weights & Biases: Weights & Biases, founded in 2017 and acquired by CoreWeave in March 2025, offers model monitoring dashboards. As enterprises increasingly need to understand why their models make certain decisions, Weights and Biases helps companies experiment with their hyperparameter tuning. Goodfire's feature steering lets developers tune model internals to shape how an AI model thinks and responds, offering a similar output with a fundamentally different optimization approach.
Wisent AI: Founded in 2025, Wisent builds representation-engineering guardrails that let developers inspect a model’s hidden-state activations and rewrite model behaviors, delivering a lightweight brain-editor API. Though a formal valuation has not been disclosed as of August 2025, Wisent has raised between £250K - £500K from Entrepreneur First and Transpose. Whereas Goodfire trains large Sparse Autoencoders to create high-fidelity circuit maps, Wisent focuses on rapid, contrastive activation filters that can be tuned on small datasets and pushed to production in minutes, trading the depth of interpretability for speed and ease of integration.
Business Model
Goodfire operates on a usage-based API business model, with revenue generated through token consumption across multiple service tiers. The company charges per million tokens processed (including both input and output), with pricing tiered by model size: $0.35/million tokens for smaller models like Llama 3.1 8B, scaling to $1.90/million tokens for larger models like Llama 3.3 70B.
The cost structure remains compute-bound, evidenced by strict rate limits. Organizations face a 50K token/minute global cap shared across all API methods, with further per-feature throttling. Advanced interpretability functions such as AutoSteer and AutoConditional are limited to 30 requests/minute, while simpler utilities such as Search or Rerank permit 1K requests/minute. This hierarchy implies exponentially higher computational costs for core interpretability features, such as activation analysis, adversarial testing, versus auxiliary tools.
Traction
In August 2024, Goodfire’s $7 million seed round, led by Lightspeed Venture Partners, signalled an early investor conviction that the team’s research could become core infrastructure for safer model deployment, as partner Nnamdi Iregbulem called interpretability “a fundamental primitive in the AI stack”.
By November 2024, the startup’s tools powered the Reprogramming AI Models hackathon, where more than 200 researchers in 15 countries used Goodfire’s API to detect adversarial attacks and unlearn hazardous behaviors and prototype safety features.
In December 2024, Goodfire shipped Ember, the first hosted mechanistic-interpretability API, with support for Llama-3.3 70B and 3.1 8B. Early enterprise and research partners Rakuten, Apollo Research, and Haize Labs reported benefits such as higher safety-benchmark scores, faster PII audits, and new scientific insights. Ember represents underpinning internal evaluation pipelines at multiple frontier-model developers, and Goodfire has open-sourced its Sparse Autoencoder interpreters to spur wider adoption.
Valuation
Goodfire secured a $50 million Series A in April 2025 at a $200M valuation led by Menlo Ventures and including funding from Anthropic, marking Anthropic’s first direct investment in another company. As of August 2025, Goodfire has raised a total of $57.2 million. In August 2024, Goodfire raised a $7 million seed round led by Lightspeed Venture Partners, with participation from Menlo Ventures, South Park Commons, Work-Bench, and other notable investors.
Goodfire’s backing from Anthropic is particularly noteworthy, with CEO Dario Amodei commenting that its investment in Goodfire reflects their belief that mechanistic interpretability is among the best bets to help us transform black-box neural networks into understandable, steerable systems.
Key Opportunities
Pre-Training
As of August 2025, pre-training feeds massive, diverse datasets without precisely understanding how specific data shapes internal model representation. It's a "more data is better" approach that requires enormous computing resources. However, Goodfire could identify which data points contribute to developing specific interpretable features, reduce wasteful training, and intentionally shape the model's internal representations.
This would allow developers to create distinct neural modules responsible for specific types of reasoning or knowledge representation, build activation patterns that tend toward interpretable sparsity, or even observe points throughout the architecture where activations can be easily monitored and understood.
Finally, Ember represents a first step in full-stack scientific discovery for customers with rich data but low machine learning knowledge. In high-stakes industries such as the medical domain, Goodfire’s platform could allow experts to provide feedback on feature quality during the training process, democratizing the ability to create and customize domain-specific models.
Post-Training
Traditionally, models that perform better on specific tasks run a separate training process on a particular dataset that updates their model's weights. Harnessing the power of the latent knowledge present in an already powerful base model to edit features directly enables a new feedback loop that doesn’t require labeled data. This allows Ember to adjust the model's behavior on the fly during use by directly manipulating internal parameters instead of having to rebuild the model from scratch, a process known as zero-data, inference-time fine-tuning.
The number of tokens Ember is processing is nearly tripling monthly, as hundreds of researchers use Ember to do fundamental interpretability research. Goodfire partners with nonprofits such as Apart Research to conduct hackathons, which could be a source of facilitating bottom-up adoption among interpretability research.
If Ember becomes the standard for this type of model refinement, every lab introducing a new science foundation model will be able to publish in more detail what the model has learned, which can only be unlocked through an NPI such as Ember.
Financial Technology
The London School of Economics researchers’ use of Sparse Autoencoders (SAEs) to cluster companies by fundamental similarity offers an alternative to traditional industry codes (SIC or GICS), which often overlook nuances in financial performance, cultural behavior, or operational characteristics. The post-2008 regulatory environment, with its focus on stricter stress-testing and financial stability, creates a strong case for Ember’s feature extraction being used to audit finance. Financial markets researchers have used SAE-powered similarity graphs to achieve a higher return in pairs trading strategies, pointing towards use cases for banks, investment firms, and hedge funds.
Key Risks
In-House Teams
The very organizations that need interpretability, such as leading AI labs, often have the resources to develop their solutions. Although Anthropic has invested in Goodfire, CEO Dario Amodei has publicly stated that Anthropic will be investing significantly in reliably detecting AI model problems by 2027. Open-source communities such as Eleuther AI also produce free interpretability frameworks, while InterpretML provides unified APIs and visualization platforms for both glass-box and black-box interpretability at no cost. Though Goodfire’s competitive advantage lies in the deep expertise of its team, there is a well-documented trend of major technology companies poaching talent from AI startups, sometimes through unconventional deals without formal acquisition. This risk incentivizes Goodfire to foster a community or standard around its tools to become the default, rather than a proprietary option that could be bypassed by a well-funded internal effort.
Research Crossroads
Given that interpretability is a new field, there is still strong debate within the community about its effectiveness. Researchers like Neel Nanda have noted that interpretability alone may not reliably catch deception in the worst-case AI scenarios due to the lack of ground truth for what concepts AI models actually use. This makes it difficult to validate interpretability claims, favoring alternative methods such as linear probes. There is also debate over whether the extended research effort in interpretability relative to other technical agendas will be sufficient for ensuring safety in more powerful future systems.
The Philosophy of Semantics
Sometimes, a model’s feature representation doesn’t have a clear name or meaning to humans. This can lead to semantic confusion, with mislabeling or over-interpreting features posing the danger of wrong decisions. Goodfire partly addresses this with automated labeling using language models, but if users lose trust because some features are misleadingly labeled or inconsistent, they might doubt the whole system. Goodfire’s trustworthiness relies on continually validating and communicating the uncertainty around interpretations (perhaps providing evidence for each feature’s effect).
Summary
The AI interpretability field is emerging as a critical foundation for safe and reliable AI deployment. Tools like GoodFire's Ember platform are becoming essential for understanding and controlling advanced AI systems. To ensure both accuracy and safety, developers and enterprises should move beyond black-box AI models by gaining direct access to neural network internals to ensure responsible deployment. Goodfire has the potential to become a leader in this rapidly growing and essential market. With $50 million in Series A funding less than one year after founding, the key question remains whether Goodfire can continue to advance the science of mechanistic interpretability while scaling its platform to meet enterprise demand in an increasingly competitive AI infrastructure landscape.