CZI Annual Letter 2023
Dear friends,
Looking forward to 2024, we see an incredible opportunity to use artificial intelligence (AI) to accelerate the progress of our work at the Chan Zuckerberg Initiative.
In the early days of CZI, we didn't know the exact path we were going to take, but we had a sense of where to start. We listened to our partners in the field and learned about the challenges they faced. Then, we explored the problems that CZI — as a philanthropic organization with a builder mentality staffed with incredible scientists, educators, full-stack technology teams, and mission-driven people who power our work — was uniquely positioned to tackle. We call these CZI-shaped problems.
In science, we've spent the last seven years exploring and refining the CZI-shaped problems we will tackle. When we launched our science work in 2016, we set a goal to help scientists cure, prevent, or manage all diseases by the end of the century. We knew accomplishing this goal required a better understanding of the cell, the fundamental unit of life in which many diseases originate. To make progress in filling those gaps in knowledge, we needed to strengthen and increase the field's understanding of basic cellular biology.
To put it another way: how can you debug the code if you can't examine all the lines of code? We want to equip scientists with that same ability to step through the code of our bodies, which requires new tools to observe, measure and analyze the processes that keep us healthy and the errors that cause disease.
So, we started supporting researchers in gathering new information about our cells, the tissues and organs they form, and what changes occur between health and disease. We built open source software tools that helped scientists better access, explore and annotate that data. We also built scientific institutions to help address some of the biggest challenges in cell biology and imaging. And now, we are building a high-performance AI computing system to power predictive models of healthy and diseased human cells.
We believe this convergence across data, tools and AI will unlock new scientific understanding and discoveries about how cells behave and function. This is the next step toward our goal to help end disease as we know it.
When people first hear this goal, they either think it's impossible or inevitable. We think the truth lies in the middle — it can happen, but only if we build the tools and fund the science to make new discoveries that help change the way scientists can see the world.
Take a look at how we've been building for this moment by following the path of two of our earliest bets in science: single-cell biology and our multidisciplinary Biohub model.
CZI Science Launches
We launch our science work, focusing on building comprehensive datasets and computational tools to quickly access and analyze data, while spurring collaboration across interdisciplinary fields to share insights. We build tools, fund research worldwide, and do science at our institutes through multidisciplinary collaboration to make discoveries about health and disease.
Chan Zuckerberg Biohub San Francisco
We launch the San Francisco Biohub, where the Bay Area's leading academic institutions (UC San Francisco, Stanford University, and UC Berkeley) join forces to pursue the toughest, riskiest scientific challenges in cell biology and infectious disease.
Single-Cell Biology
One of the first single-cell research efforts we support is the Human Cell Atlas — a global, scientist-led collaboration to map and characterize all cell types in the healthy human body. Additionally, the generation of cell atlases becomes a major focus of our partnership with the new San Francisco Biohub. As it evolves, the Human Cell Atlas continues to be an essential and foundational resource helping scientists better understand how healthy cells work. Data from the Human Cell Atlas will feed into one of the large-scale datasets that we use in building our AI model to predict cell types and states.
Pilot Projects for a Human Cell Atlas
We fund our first grants to help researchers evaluate protocols and best practices for the collection of single-cell molecular data, which are then compiled for the Human Cell Atlas. One important result of this work: it lays the foundation for generating more data to support the Heart Cell Atlas by increasing the number of known cells in the heart by an order of magnitude and identifying hundreds of marker genes for cardiac cell types in states.
Collaborative Computational Tools for a Human Cell Atlas
We make investments to develop more interoperable data and advanced computational methods necessary to integrate and build models. Bringing cutting-edge machine learning, AI, statistics and other modeling to single-cell data is critical to extracting insights from complex biological data, like single-cell datasets.
CELLxGENE Annotate
We build CELLxGENE Annotate, an open source tool that allows scientists to collaborate to analyze and annotate large single-cell datasets.
Seed Networks for the Human Cell Atlas
We launch the Seed Networks for the Human Cell Atlas grant to bring together experimental scientists, computational biologists, software engineers, and physicians to create a high volume of standardized datasets across different tissues and organs. This project funds 38 collaborative science teams — representing 20 countries and over 200 labs — and has supported the delivery of foundational data from more than 15 human organs.
CELLxGENE Discover
We roll out CELLxGENE Discover, a data platform that democratizes access and enables easy exploration of the growing amounts of single-cell data being generated by researchers worldwide, as well as data from the Human Cell Atlas. Many of the datasets from grants we've funded and the San Francisco Biohub — including Tabula Sapiens and the COVID-19 Tissue Atlas — are available in Discover.
COVID-19 Response
We leverage our infectious disease expertise to focus on supporting efforts to curb the pandemic, from standing up a testing facility in just eight days to working with 22 California Departments of Public Health to track the spread of COVID statewide and building data around SARS-CoV-2 through the COVID-19 Tissue Atlas. This dataset employs single-cell techniques to document the human body's response to SARS-CoV-2 infection in six different organs.
Single-Cell Biology Data Insights
Our work to build and support single-cell tools continues with the launch of the first cycle of Data Insights grants, which support researchers and computational experts who are advancing tools and resources to process large volumes of single-cell data using machine learning methods. These projects use large collections of data and employ advanced modeling to make sense of multiple datasets.
Pediatric and Ancestry Networks
Our Pediatric Networks and Ancestry Networks grants aim to make data for the Human Cell Atlas more representative of different ancestries and age groups — which addresses knowledge gaps and helps ensure the data being used to train large language models is more inclusive of the diversity of human biology.
Chan Zuckerberg Biohub Network
Building on the success of the San Francisco Biohub's unique collaborative model, we announce the creation of the Biohub Network — a group of nonprofit research institutes bringing together scientists, engineers and physicians with the goal of pursuing grand scientific challenges on 10- to 15-year time horizons.
Chan Zuckerberg Institute for Advanced Biomedical Imaging
We bring together a range of diverse disciplines at the Imaging Institute to develop revolutionary new imaging hardware and software tools that will help researchers see and understand cellular dynamics in living systems in a broader biological context.
OpenCell
The San Francisco Biohub develops OpenCell, which provides protein localization data for more than 1,300 functionally critical human proteins, along with a map of more than 30,000 interactions among proteins within cells.
Tabula Sapiens
Stemming from early collaboration with our Seed Networks research, scientists at the San Francisco Biohub generate data about the cells within our organs and tissues and publish The Tabula Sapiens. The team uses unique methods to map human cell types by measuring gene expression in nearly 500,000 cells from 24 different tissues and organs, most obtained from the same donor.
Biohub Network Expands
We welcome two new institutes to our Biohub Network:
Chicago Biohub, which is embarking on work to embed miniaturized sensors into tissues that will allow us to monitor molecular and cellular signals with the goal of understanding and treating the inflammatory states underlying many diseases.
New York Biohub, which focuses on bioengineering immune cells — the only cells in our bodies that come in contact with virtually all of our organs — so scientists can identify diseases early and potentially treat them. Initial disease targets include cancers such as ovarian and pancreatic cancer and neurodegenerative diseases like Parkinson's and Alzheimer's.
CryoET Data Portal
The Imaging Institute uses cryo-electron tomography (cryoET) to generate 3D images of cell structures and biomolecules, like proteins, in cells. Along with researchers worldwide, they are compiling imaging datasets into the cryoET data portal — a cloud-based community database that aims to provide machine learning experts access to training data, helping researchers find insights faster. By creating an easy-to-use portal, more researchers will be able to deposit data, develop new, quicker processing methods, and interpret biological significance.
Zebrahub
San Francisco Biohub publishes the Zebrahub project, a dynamic, open source atlas of zebrafish development integrating advanced microscopy and single-cell biology. Zebrahub provides scientists a detailed understanding of how tissues, organs and whole organisms develop from a single cell.
Human Breast Cell Atlas
A team of researchers from The University of Texas MD Anderson Cancer Center; UC, Irvine; and Baylor College of Medicine publish the Human Breast Cell Atlas. Through work we helped fund from the Seed Network grants, researchers now have a stronger understanding of immune involvement in breast biology, which will help inform how scientists approach the development of more targeted treatments. All the data from this research and several key atlas efforts are available on CELLxGENE Discover, which other scientists can use to help accelerate their own research.
CELLxGENE Census
We announce Census, which provides scalable access to the largest collection of single-cell data from more than 500 individual datasets and over 33 million unique cells, giving researchers easy access for their own analysis needs such as developing AI models. Census' scalable infrastructure is critical to accelerate the work of researchers in the modern era of big data. Today, nearly 1000 researchers access it weekly, including a lab that's developing large foundation models like scGPT.
The Path Forward With AI
Looking back, it's easy to see how each step in our journey was a building block that laid the groundwork for CZI to take advantage of recent AI advancements, so we can make progress faster. And these milestones are just a part of the work happening across our science teams that have led us to where we are today.
Right now, we are building an AI computing system that will be used to train models that describe cell systems based on datasets CZI and the scientific community have been assembling since some of our earliest grants. Once it's up and running, it will be one of the world's largest AI clusters for nonprofit scientific research — and, eventually, it will power predictive virtual models of our cells. As we move deeper into understanding our cells and their interactions, the data will be increasingly complex. Combining these predictive virtual cells with the power and promise of large language models will enable our teams and collaborators to continue to work toward our north star of supporting the science to end disease. Scientists will be able to use these models to see how cells may respond to specific conditions — for example, how an immune cell responds to an infection, what happens at the cellular level when a child is born with a rare disease, or even how a patient's body will respond to a new medication. By showing us what happens to cells when a change is introduced, these virtual models of cells will allow researchers to quickly answer questions and choose the highest-yield research and clinical trials.
Co-Building Tech Tools With Teachers
Beyond science, we've also focused on the CZI-shaped problems that we are uniquely positioned to take on in the education field to help unlock the potential of every child, no matter who they are or where they live. Our work is now at an important inflection point as we reflect on the impact of eight years of grantmaking and building technology to advance research on student learning and human development — and its translation into classroom practice.
Despite important progress, many research-backed resources and tools still aren't reaching students and teachers. That's why we're exploring new ways to leverage our grantmaking and technology capabilities to co-build tools with teachers and students grounded in underutilized research and support whole child approaches to education.
This work will expand on what we've learned from developing technology — such as Along and Summit Learning — so we can equip educators with tools that address key education challenges. As we continue to build, we will thoughtfully consider opportunities to accelerate research-backed practices by using AI. As always, our approach will be informed by feedback from researchers, experts and educators.
Our Collective Impact
When we started CZI, we couldn't have predicted exactly where we are now. But at every turn, our goal has been to leverage and advance technology to accelerate impact across science and education.
So, thank you to our team and our family of partners. We are proud to have done important work in 2023 — and we are ready for the opportunities 2024 has to offer.