Engineering, Research & Development
The Defense Advanced Research Projects Agency (DARPA) works with outside Tier I research teams on advanced research, but has no system to support storing, sharing or analyzing data that has been generated. We created a cloud system called BIFROST to address these limitations and advance science through collaboration. Our system is cloud-based and supports storage of data files plus metadata (which is vital for searching and making sense of files full of numeric data) and cloud-based analysis. Research teams on the THoR and Prometheus projects use the system.
While some researchers (like those in the Human Genome Project) effectively share data, other medical research communities do not. Trapped in a “publish or perish” cycle, they tend to keep their data private. Our challenge was to create a system to support data privacy, sharing, and analysis for research teams who typically don’t share data or analytical results until after they have fully published their findings. For some researchers, using cloud-based servers for analysis is also a departure from performing work on a laptop or university system. There were several pitfalls: high-speed internet is not a given, university networks interfere with VPN’s, and using US-based cloud systems from overseas can be problematic.
We identified the key features of ‘publish, discover, analyze, and collaborate’ and implemented them using Amazon Web Services. BTS had already built a data-sharing prototype for DARPA involving neuroscience data, which gave us a head start. The key innovation behind BIFROST is the use of metadata in almost everything. Biological data is not good at self-description, making it hard or impossible to search for and share. We enforced the creation of metadata with every experiment, dataset, or document uploaded and put that metadata into our search engine. This enabled the creation of a search engine that can filter results based on metadata, making it easier to find the information needed. Metadata is always public, enabling you to discover private data. BIFROST lets you request access to private data and tracks who has been given permission to that data. DARPA, as well as end-users, have given excellent feedback throughout the life cycle of BIFROST.
The BIFROST system was originally designed to support a single DARPA project. It has since expanded to additional projects and to BARDA. There are now about 60 users from 15 different leading research institutions. The system stores several terabytes of data. While this is not a huge amount of data, most of the files could not be shared through email or without having an FTP server or something similar. This is a sub-optimal solution due to poor security and the need for multiple systems to support sharing. BIFROST provides a single, central cloud system for sharing data.
While we cannot disclose details of models developed with the support of this system, it’s safe to say that sharing data through BIFROST has enabled the creation and validation of new machine learning models with predictive capabilities not seen before.
BIFROST has also received attention from clinical research organizations, the FDA, and the University of Maryland Biopark.