Skip to content

In Depth: Steve Crossan, DCVC, on AI

For 13 years, deep tech has been at the heart of DCVC’s investments — and at the heart of deep tech has been artificial intel­li­gence. From the beginning, we have backed companies that use AI’s power to open new solutions for hard, often old problems. AI has helped Capella Space provide the world clear imagery of across the globe, 247, in any weather; Relation Ther­a­peu­tics develop a novel approach to drug discovery; and Pivot Bio create a clean, energy-efficient replacement for synthetic nitrogen fertilizer. 

But while AI is not new, the recent dramatic improve­ments in large language models (LLMs) are indeed a giant step forward — and have made AI’s power viscerally compre­hen­sible for millions of people for whom it was largely a distant abstraction. Since the release of ChatGPT in November 2022, there has been a surge in both the usage of generative AI tech­nolo­gies and the number of platforms leveraging them. Powered by LLMs, these tech­nolo­gies have surprised even their creators with what they can do.

To make sense of the profound impli­ca­tions of this advance, we spoke with DCVC Operating Partner Steve Crossan. Before joining DCVC, Steve was VP of Artificial Intel­li­gence and Machine Learning at phar­ma­ceu­tical company Glax­o­SmithK­line. He also spent a number of years at Google; after Google acquired British AI research laboratory DeepMind in 2014, Steve led the team tasked with bringing DeepMind’s technology into Google products.

In this interview, Steve shares his perspec­tives on inter­pretability, the theoretical and practical limits to large language models, and the potential impact of these advance­ments on venture capital, company creation, and innovation.

DCVC: Google’s AI learned Bengali from just a few queries, without being told to do so, and no one knows how. We also recently learned about a provocative paper from researchers at Microsoft in which they claim AI is showing signs of human reasoning. Do we actually know what these trans­formers are doing?

Crossan: That is the question of the hour, and I think the answer is, only partially.” Indeed, many of their capa­bil­i­ties have surprised even their creators. They can perform chain-of-thought reasoning, few-shot learning, translation, chemistry benchmarks, joke explanation, and joke writing. Initially, they were designed to predict the next word, but now they are trained as question-answering and instruction-following tools. 

The field of inter­pretability is actively studying how trans­formers achieve their tasks, and we are gaining insights into their workings. At scale, trans­formers build repre­sen­ta­tions of the world based on the data they’ve processed, incor­po­rating knowledge structure and concept rela­tion­ships. These repre­sen­ta­tions serve as effective tools for reasoning and knowledge, enabling quick learning in specific domains with fewer examples. Ongoing research aims to probe the layers of stacked repre­sen­ta­tions in trans­formers, leading to valuable insights for alignment and guiding future advance­ments in algorithms and architectures.

DCVC: Is there a theoretical limit to the number of parameters in a LLM, and why does that question matter? 

Crossan: While I’m not enough of an expert, it seems that we don’t have a definitive answer regarding a theoretical limit to how capa­bil­i­ties scale with parameters and data. However, practical limits are crucial in the real world. Speculating on whether there are inherent limits to the models’ intel­li­gence is challenging. Some believe there must be limits, but we lack a conclusive answer. The engineering challenges and costs associated with training these models are significant factors. OpenAI’s success can be attributed, in part, to their engineering focus. The cost of training GPT‑4 alone is said to have exceeded $100 million, and scaling further poses cost and engineering challenges. However, as the ever-decreasing cost of compute resources makes it cheaper to train models of a given size, it becomes essential to contemplate the theoretical limits and the possibility of leveraging this trend to develop even larger models.

DCVC: Might a transformer develop a theory of mind? If it did, what would that allow one to do?

Crossan: This is a complex and philo­soph­ical question without a definitive answer. Some researchers argue that these models can pass tests indicating a theory of mind, but theory-of-mind researchers tend to disagree. The lack of a widely agreed-upon definition or test for theory-of-mind makes it challenging to assess. In the past, the Turing test was considered significant, but now it may be insuf­fi­cient as newer AI models can pass it. There are alternative tests, like the Garland Test [also known as the Ex Machina Test], proposed by novelist Alex Garland, which focuses on whether a machine can persuade someone it is conscious rather than merely pretending to be human. Overall, it remains unclear whether we can defin­i­tively determine if these models possess a theory of mind due to the absence of widely accepted tests in this domain.

DCVC: But could it be said that these trans­formers have something close to a model of the world, at least in certain narrow contexts?

Crossan: It seems reasonably clear that the larger models do have some kind of internal repre­sen­ta­tion of the world. An example is the ability to reason over questions like, Given a book, some eggs, and a pen, how would you build a platform to stand the pen on its end?”

DCVC: We’ve seen deep learning programs invent surprising winning moves in chess or Go. Could one invent an experiment with a surprising result?

Crossan: Deep learning programs have demon­strated surprising winning moves in games like Go and chess, which are perfect information games with known rule spaces. However, designing an experiment with a surprising result that advances science is a different challenge. There is no a priori reason to rule it out, and it’s possible that future models, including those supple­mented by physics models, like Stephen Wolfram has proposed, could suggest previously unexplored experiments that contribute to human knowledge.

DCVC: If summoning the demon of a superhuman intel­li­gence is a misplaced fear, what should we worry about?

Crossan: The debate around safety is divided between theoretical scenarios of a realized super­in­tel­li­gence with its own agency and internal goals, and the risks posed by bad actors who have access to powerful technology. The latter, involving misuse of technology by bad actors for purposes like misin­for­ma­tion, election inter­fer­ence, and operations at scale, are very real and significant dangers. These risks have become more pronounced with the increasing power and afford­ability of these tools. While the long-term concerns of super­in­tel­li­gence should not be dismissed, it is important to pay attention to the immediate dangers and the potential unintended conse­quences of powerful technology. Safety and alignment research, which is related to inter­pretability, is crucial but often underfunded compared to achieving the next milestone.

DCVC: What do trans­formers mean for VC, company creation, or innovation more generally?

Crossan: Trans­formers and the advance­ments in AI technology will reshape the tech landscape, creating oppor­tu­ni­ties and threats. This will lead to the creation of new companies and the potential for significant value generation. However, distin­guishing valuable ventures from the vast number of companies incor­po­rating generative AI will be a challenge for venture capitalists. There are already real business oppor­tu­ni­ties and money being made, especially in areas like integrating proprietary data into language models and fine-tuning models for specific purposes. Tools for non-experts to fine-tune models will emerge, affecting many software-as-a-service (SaaS) sectors. Scientific software, in particular, will undergo exciting trans­for­ma­tions. While trans­formers will be a core component, the coupling of trans­formers with other systems, such as physics models or knowledge graphs, will drive interesting devel­op­ments in the coming years.

DCVC: Many startups have been using branches of AI other than LLMs (such as machine learning or computer vision) for years. Will their work be affected by advances in LLMs? 

Crossan: One interesting thing about the transformer archi­tec­ture is that it seems to work across many different modalities, not just text. So, I think that other domains such as speech and vision are at least going to find something to learn in these archi­tec­tures. And we’re recently seeing great examples of multimodal models which can generate both text and images, for example.

Related Content