Skip to main content

BioGRID Provides Interaction Data to the Research Community

Each organism’s cells work through a complicated network of molecular interactions controlled at multiple levels. Understanding how these complex networks regulate each other to control phenotypes, human health, and disease is a long-standing problem in biology. The Biological General Repository for Interaction Datasets (BioGRID) is a database that collects information available in the scientific literature about such interactions and organizes those data into formats that are accessible to the research community.

A hand pointing at data analytics on a screen.
Figure 1. Data analytics using AI/ML technologies. Image courtesy of NicoElNino/Shutterstock.

ORIP has been supporting BioGRID since 2007 (R01RR024031), including providing supplements that allow BioGRID to temporarily hire more staff to curate the influx of interaction data generated by scientists working to address urgent issues. BioGRID allows researchers to explore deeper questions more efficiently and cost effectively (Figure 1). Having data available means researchers do not have to generate their own data from scratch or repeat experiments conducted by others, and sharing these data aligns with NIH’s focus on new approach methodologies. Many researchers also use BioGRID data to validate their studies, increasing rigor and reproducibility across biomedical research. In 2024, BioGRID recorded 151,907 page views and 17,753 unique visitors on average per month, and BioGRID’s Open Repository of CRISPR Screens (ORCS) had an average of 12,429 page views and 2,380 unique visitors per month. BioGRID’s data files also were downloaded an average of 13,525 times per month, and these downloaded files can be disseminated broadly by BioGRID’s partners and collaborators.

Although more researchers are making their data available to the scientific community, the massive quantity of data, often in different formats, means that finding the exact data a researcher needs can be difficult. “With the way that data are now being generated, a huge amount of data are there—we just need to figure out how to efficiently collect them, annotate them, and get them into the hands of researchers,” said Dr. Kara Dolinski, Co–Principal Investigator of the BioGRID project along with Dr. Mike Tyers. Information also is frequently presented within scientific papers themselves, which are time-consuming to search. “Technology has rapidly changed the way people do biology, but we still write papers like Charles Darwin did,” Dr. Dolinski said. “Collecting data from a freeform narrative format and putting them into standardized formats that researchers can actually use is a non-trivial problem.”

When BioGRID first began, the team would scan new publications every month without much filtering and note any potentially relevant data. “We’d pull in thousands of papers, and we’d have to manually sift through them,” explained Dr. Rose Oughtred, Lead Curator of BioGRID. With the advent of new technologies in computing, sophisticated text-mining algorithms are used to search and prioritize papers for curation. BioGRID is continually improving its curation pipeline by incorporating more automated methods for triaging the vast biomedical literature. The BioGRID team then can focus its efforts on the most relevant papers and collect data that will be useful across science. “The curators are all Ph.D.-level trained biologists, and they work closely with the developers on designing web interfaces, searches, and the way a biologist would want to access the data,” Dr. Dolinski explained. Making the data findable and accessible in ways researchers can use is a critical priority for BioGRID. The curators also ensure the data are in standardized formats so they can be interoperable among systems and reusable by the broad community of researchers, including computer scientists applying the latest artificial intelligence (AI) and machine learning methods. “Extracting the information is what makes this resource very valuable to researchers. If they had to sift through all of the data and all of the literature, they wouldn’t have much time left over to carry out their research,” Dr. Oughtred said.

Beta-amyloid plaques and tau in the brain.
Figure 2. Beta-amyloid plaques and tau in the brain. Image courtesy of the National Institute on Aging, NIH.

In addition to broadly curating new publications, BioGRID staff engage in specific themed projects to deepen the amount of data available to researchers on a particular disease, condition, or theme. The curation team focuses on reviewing all papers relevant to that topic, allowing them to make an impact in one area more rapidly. The BioGRID team collaborates with experts in that field to prioritize gene sets that are most important for the particular biological area. As they read papers, they learn more about the field, so each subsequent paper can be searched more quickly, and relevant data can be identified. These themes may be chosen based on relevant public health issues or requests from researchers for help with expert curation in their field. Highly conserved cellular pathways also may be chosen for a focus project because these pathways have the potential to affect multiple areas of human health and disease. Recent topic areas include Alzheimer’s disease (Figure 2); autism (Figure 3); synthetic protein interactions; and virally induced cancers, such as those caused by human papillomavirus. “With these more focused projects, we take a targeted approach,” Dr. Oughtred explained. “If we have to go back in time when we start a project, like the virally induced cancers project, we go back to the earlier literature and extract the information that we can find and then progress through the years. Over the years, of course, research became a little more sophisticated, and we have to extract a lot of the high-throughput data, often from supplementary files that are not easily accessible online for researchers, and make it presentable and, critically, computable.”

Multicolor image of the whole brain.
Figure 3. Multicolor image of the whole brain. Image courtesy of the National Institute of Mental Health, NIH.

The project focusing on virally induced cancers is part of an effort to develop an AI framework to identify potential drug targets for these cancers. “A huge part of that is understanding what proteins you can safely target, and you really can’t look at a protein in isolation,” Dr. Dolinski said. “You need to understand the larger context of all the proteins each protein interacts with, under what conditions, and in which tissues, which affects how effective that target will be and whether it’s tolerable as a drug target,” she explained. The BioGRID team is generating interaction data and collecting cellular and tissue context around those interactions. The curators then will work with computer scientists to generate an AI framework that incorporates all the data and predicts which proteins might be the safest and most effective targets for these cancers. BioGRID also is beginning work on a cutting-edge project to curate interaction data for synthetic proteins. “The new machine learning methods coming online allow researchers to design de novo synthetic proteins that are not found in nature, and these can be targeted to interact with a protein of interest—this might be a disease target, such as a key protein involved in cancer or even viral infections,” Dr. Oughtred explained. “We’re the only database that provides this synthetic protein interaction curation. Because these AI/large language model [LLM]–based methods are so new, we’re mostly curating from preprints and hot-off-the-press papers. We tailor the curation to whatever the project requires,” she added.

Expanded resources are occasionally required when such urgent or new fields emerge, but the robust infrastructure BioGRID has developed allows the team to move rapidly in new directions when necessary. For example, since BioGRID was created, CRISPR technology has been developed, so the ability to collect CRISPR data via BioGRID’s ORCS project was added to the existing BioGRID framework. By ensuring the curators are able to work on new and urgent projects, BioGRID provides a vital, foundational resource that many researchers depend on to ensure their studies are working at the cutting edge of science. The meticulous curation and AI approaches used by BioGRID support both validation of experimental results and identification of new targets for treatment. For example, researchers at Tulane University recently used BioGRID data to identify risk genes for autism spectrum disorder.1 In another recent study, researchers at The University of North Carolina at Chapel Hill used BioGRID’s interaction data to provide context to genes identified by CRISPR screens as potential therapeutic targets for a type of highly lethal cancer.2 BioGRID’s data also have been used to train AI and LLM models to predict protein–protein interactions and CRISPR screen results.3,4,5

BioGRID collaborates with other databases to minimize any duplication of effort, and as part of the Alliance of Genome Resources, BioGRID’s interaction data and human CRISPR phenotype data are incorporated with data from other public databases to provide comprehensive information to researchers. In addition to providing an expansive collection of data from past and current studies, BioGRID is looking ahead in a few areas, such as synthetic protein interactions. These synthetic proteins are generated computationally but validated experimentally as a final step, saving significant time, cost, and effort in the pursuit of new treatments. “Instead of trying to isolate antibodies that might treat a viral infection—or even design those antibodies using more conventional methods that are quite time consuming and involve many experimental steps—the new LLM-based methods and diffusion methods are allowing de novo design of synthetic proteins that can focus on targets of interest,” Dr. Oughtred explained.

“It’s really rewarding to see how many people use the resource, how many people depend on it and are appreciative of it,” Dr. Dolinski said. “I love going to meetings because people run up to you after your talk or poster and they say, ‘Oh my God, I love BioGRID, I use it every day, and it’s so important.’” Dr. Oughtred agreed, noting, “For that reason, I feel like the curators—all of us—are making a difference. We’re contributing to science and really helping to facilitate research.” Both also emphasized that ORIP’s support has been vital in ensuring that the BioGRID database is on top of cutting-edge science across disciplines. “ORIP does a great job connecting different types of resources together, providing general support for these types of projects, and helping us spread the word about our resources,” Dr. Dolinski said. Dr. Oughtred added, “I think the support we got from ORIP really made it possible for us to make a difference. ”

References

1 Rajan KC, Tiemroth AS, Thurmon AN, et al. Zmiz1 is a novel regulator of brain development associated with autism and intellectual disability. Front Psychiatry. 2024;15:1375492. doi:10.3389/fpsyt.2024.1375492.

2 Goodwin CM, Waters AM, Klomp JE, et al. Combination therapies with CDK4/6 inhibitors to treat KRAS-mutant pancreatic cancer. Cancer Res. 2023;83(1):141–157. doi:10.1158/0008-5472.CAN-22-0391.

3 Schmid EW, Walter JC. Predictomes, a classifier-curated database of AlphaFold-modeled protein-protein interactions. Mol Cell. 2025;85(6):1216–1232.e5. doi:10.1016/j.molcel.2025.01.034. 

4 Chandak P, Huang K, Zitnik M. Building a knowledge graph to enable precision medicine. Sci Data. 2023;10(1):67. doi:10.1038/s41597-023-01960-3. 

5 Song S, Abdrabou A, Dabholkar A, et al. Virtual CRISPR: Can LLMs predict CRISPR screen results? Proceedings of the 24th Workshop on Biomedical Language Processing (BioNLP 2025). 2025:354–364. https://aclanthology.org/2025.bionlp-1.30.pdf