Document 5: Existing databases funded by consortiums

Drafted by ASAPbio


arXiv is the most directly comparable model in terms of the database content (preprints). Important elements of arXiv’s success are:

  1. single point of ingestion and one-stop shopping for viewing (everyone in the physics community wakes up and searches arXiv),
  2. high visibility and quality (a reason why scientists submit to arXiv to establish priority)
  3. operated for the community good on a not-for-profit basis by a trusted academic institution (Cornell) which has been operating for a century,
  4. funding by a consortium (a major private foundation (the Simons Foundation) and institutions), and
  5. governance by scientists (not just a passive advisory board).

From the arXiv web site:

            In January 2010, Cornell University Library (CUL) undertook a three-year planning effort to establish a long-term sustainable support model for arXiv, one that reduced arXiv’s financial burden and dependence on a single institution and transitioned it to a collaboratively governed, community-supported resource. CUL identified institutions worldwide where the use of arXiv was most active and worked collaboratively with them to develop a membership and governance model based on voluntary institutional contributions. A formal long-term plan took effect in January 2013. In this new model, arXiv is supported by libraries and research laboratories worldwide that represent arXiv’s heaviest users, as well as by CUL and generous matching funds from the Simons Foundation.

Protein Data Bank /Worldwide Protein Data Bank

The protein data bank is a worldwide cooperative of independently supported databases. Thus the wwPDB is a multiple server model based upon geography and geographically located funding agencies.  A common archive of structures is updated and mirrored on all sites, although each site maintains its independence in terms of ingestion and its own web sites (researchers can choose from which site they download).  The incoming data are more complex than those handled by preprint servers, since different types of data are deposited (e.g. X-ray, NMR, etc). Quality control is more of an important issue for structural data than for preprints.  Posting on the PDB is validation (not true for preprints) and constitutes a major part of the PDB mission.  Overall these databases are viewed as being very successful and are reasonably well funded (RCSB alone receives $6.5 million in funding from US government agencies). Arguably, elements of the worldwide collaboration might be subject to inefficiencies and difficulties in governance, but overall the system is also a reasonable model of organizing and distributing information as a public good.

Below Prepared by Stephen K. Burley (Director, RCSB Protein Data Bank)

Protein Data Bank Archive and the Worldwide PDB Protein Data Bank Organization:

The Protein Data Bank (PDB) is the single global archive for experimentally determined, atomic-level structures of biological macromolecules. The PDB archive is managed by the Worldwide Protein Data Bank organization (wwPDB; [Berman et al. 2003], which currently includes three founding regional data centers, located in the US (RCSB Protein Data Bank or RCSB PDB;, Japan (Protein Data Bank Japan or PDBj;, and Europe (Protein Data Bank in Europe or PDBe;, plus a global NMR specialist data repository BioMagResBank,

composed of deposition sites in the US (BMRB; and Japan (PDBj-BMRB; Together, these wwPDB partners collect, annotate, validate, and disseminate standardized PDB data to the public without any limitations on its use. The wwPDB collaboration is governed by an agreement signed by all four partners (last revised in 2013; The activities of the wwPDB partners are overseen by the wwPDB Advisory Committee, currently chaired by Dr. Andrew Byrd (NCI).

PDB Archive Data Contents:

The PDB archive contains information about structural models that have been derived from three experimental methods, including X-ray/neutron/electron crystallography, NMR spectroscopy, and 3D electron microscopy (3DEM). In addition to the 3D coordinates, the details of the chemistry of the polymers and small molecules are archived, as are metadata describing the experimental conditions, data-processing statistics and structural features such as the secondary and quaternary structure. The structure-factor amplitudes (or intensities) used to determine X-ray structures, and chemical shifts and restraints used in determining NMR structures are also archived. The electron density maps used to derive 3DEM models are archived in EMDB [Lawson et al. 2016] and the experimental data underpinning them can be archived in EMPIAR [Iudin et al. 2016].

wwPDB Partner Responsibilities:

The RCSB PDB provides Data In services for all depositions coming from the Americas (North and South) and Oceania. PDBe provides Data In services for all depositions coming from Europe and Africa. PDBj provides Data In services for all depositions coming from Asia. BMRB archives additional NMR data that are not captured by the other three wwPDB partners during archival data depositions. The RCSB PDB serves as the global Archive Keeper, coordinating weekly updates of the PDB archive with PDBe, PDBj, and BMRB. wwPDB partners distribute identical copies of PDB data from redundant, regional FTP

sites at no charge and with no limitations on utilization. All four wwPDB partners also distribute PDB data at no charge and with no limitations on utilization from their own value added websites in a healthy competition.

wwPDB Partner Funding:

RCSB PDB is supported by NSF [DBI-1338415], NIH, DOE; PDBe by EMBL-EBI, Wellcome Trust [104948], BBSRC [BB/J007471/1, BB/K016970/1, BB/K020013/1, BB/M013146/1, BB/M011674/1, BB/M020347/1, BB/M020428/1], EU [284209, 675858], and MRC [MR/L007835/1]; PDBj by JST-NBDC, and BMRB by NIGMS [1R01 GM109046].

Governance of the RCSB PDB:

Excerpted from “RCSB Protein Data Bank Advisory Committee Terms of Reference”

The RCSB PDB is managed by two members of the RCSB: Rutgers, The State University of New Jersey and University of California, San Diego, and is funded by the National Science Foundation, the National Institutes of Health, and the Department of Energy through a cooperative agreement. The current Director is Dr. Stephen Burley and the Associate Director is Dr. Helen Berman, who was previously the Director. Both are located at Rutgers. The site head at UCSD is Dr. Peter Rose. In addition, there is a leadership team in charge of key aspects of the RCSB mission including operations, application development, biocuration, data architecture, education and outreach.

The RCSB PDB Protein Data Bank Advisory Committee (RCSB PDBAC) is responsible for providing independent advice to the RCSB PDB Director and staff on current and pending issues of policy, operations, technical implementation, and project performance. The Advisory Committee consists of members chosen from the scientific community, who are recognized experts in their fields, including but not limited to, structural biology, cell and molecular biology, computational biology, information technology, and education. These scientists will be drawn from academia and industry. The AC is appointed by the Director in consultation with other members of the RCSB PDB, the AC Chair, and others. The 3-­year term of membership is renewable.

The RCSB PDBAC meets once a year. The Director is responsible for developing the meeting agenda in consultation with the Chair and, where deemed appropriate, funding agency staff. Meetings typically last a full working day. At the conclusion of each meeting, a written report is prepared by the members of the RCSB PDBAC describing its discussions, including any specific conclusions or recommendations with respect to changes in management and policies of the RCSB PDB. As specified by the cooperative agreement, this report is provided to the Director within 30 days of the AC meeting. The Director formulates a response to the report, addressing recommendations made, issues raised for further consideration, etc., and provides the Chair with the response. The report and the attendant responses are incorporated in the Annual Progress Report submitted to the National Science Foundation.

Europe PMC

The cooperative funding of the European PMC is an interesting model for a consortium.  In this case, each funder supports Europe PMC in proportion to their annual research spend.  One funder (Wellcome Trust) provides the lead role in organizing the consortium. The system of governance involves both the funders and the scientific community.

Prepared by Robert Kiley, the Wellcome Trust

Europe PMC is run, managed and developed by the EMBL-EBI (European Bioinformatics Institute) on behalf of the 26 Europe PMC Funders, which includes the Wellcome Trust, Medical Research Council, Cancer Research UK, the European Research Council and the World Health Organization.

A grant of £5.7M ($8.3M, €7.2M) has been awarded to Dr Jo McEntyre by the Wellcome Trust, on behalf of the Europe PMC Funders.  This grant runs from 2016 to 2021.


Europe PMC has three governing bodies: the Funders’ Group, Funder Committee and Scientific Advisory Board.

  • The Funders’ Group is made of research funders who both mandate the deposition of research papers which arise from their funding in this repository, and provide funding to facilitate this. It is responsible for setting the overall direction of travel for Europe PMC, and meets annually.
  • The Funder Committee is a subset of the Funders’ Group which meets twice a year to review completed developments, comment on future development and approve the release of funds on behalf of the Funders’ Group.
  • The Scientific Advisory Board meets annually to review progress on the development of the service over the past year, and the plans for development for the forthcoming year. The Board ensures development is sensitive the needs of the scientific community. They also advise the Europe PMC Funder Committee as to the overall effective use of funds from the Europe PMC grant.


The Wellcome Trust – on behalf of the Europe PMC Funders Group – provides grant funding to EBI to cover the cost of supporting, maintaining and developing the Europe PMC repository.

In turn all the Europe PMC funders (with the exception of the European Research Council, ERC) reimburse the Wellcome Trust according to the payment schedule detailed in the Collaboration Agreement.  ERC’s contribution to funding Europe PMC is made via a grant to the Wellcome Trust.

Each funder supports Europe PMC in proportion to their annual research spend.  This was deemed to be the most equitable way of spreading the costs across all funders.

Additional funders can join the Funders’ Group during the course of the grant by signing an addendum to the Collaboration Agreement.  In turn, EBI can submit development proposals to apply for the additional funds provided by new funders. Such applications are considered by the Funder Committee.