The Protein Data Bank (PDB) was established as the first open access repository for biological data, and the datasets it hosts have been invaluable to research in fundamental biology and the understanding of health and disease. Just this month, we witnessed the announcement of the AlphaFold2 results toward structure prediction, made possible thanks to the more than 170,000 freely accessible structures in the PDB which provided “training data” for the structure prediction software.
It was not always the case that such structural biology data were freely available, even upon journal publication. From the founding of the PDB in 1971 until the late 1980s, most journals did not require deposition of structures in a public database. A key moment was a petition, circulated in 1987 by a group of leading structural biologists, demanding that the data created be made openly available upon journal publication. This petition led to major journals adopting data deposition standards. In the early 1990s, the National Institute of General Medical Sciences (NIGMS) imposed similar requirements on all grantees.
The revolution in publishing made possible by preprints calls for a re-evaluation of data disclosure practices in structural biology. While journal review processes take weeks, months, or even years, preprints allow researchers to rapidly communicate their findings to the community. However, withholding access to PDB files that accompany preprints inhibits the progress towards scientific discovery which preprints can enable.
We pledge to publicly release our PDB files (and associated structure factor, restraint, and map files) with deposition of our preprints.
We encourage all structural biologists to also deposit raw data in appropriate resources (e.g. EMPIAR, proteindiffraction.org, https://data.sbgrid.org/, etc).
|1||James Fraser||Professor||Department of Bioengineering and Therapeutic Sciences, UCSF|
|2||Cynthia Wolberger||Professor of Biophysics and Biophysical Chemistry||The Johns Hopkins University|
|3||Phil Bourne||Dean & Professor||School of Data Science & Department of Biomedical Engineering, UVA|
|4||Aled Edwards||Director and Chief Executive||Structural Genomics Consortium|
|5||David Agard||Professor||Department of Biophysics and Biochemistry, UCSF|
|6||Gira Bhabha||Assistant Professor||Skirball Institute, NYU|
|7||Damian Ekiert||Assistant Professor||Skirball Institute, NYU|
|8||Kliment Verba||QBI Fellow||Department of Pharmaceutical Chemistry, UCSF|
|9||Brian Kelch||Associate Professor||Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School|
|10||Gabriel Lander||Professor||Department of Structural and Computational Bio, Scripps Research|
|11||Jinrong Min||PI||Structural Genomics Consortium, University of Toronto|
|12||Robert Stroud||Professor||Department of Biophysics and Biochemistry, UCSF|
|13||Seemay Chou||Assistant Professor||Department of Biophysics and Biochemistry, UCSF|
|14||Frank von Delft||Professor of Structural Chemical Biology||University of Oxford and Diamond Light Source|
|15||Chun Tang||Professor of Biophysical Chemistry||Peking University|
|16||Margaret Stratton||Assistant Professor||Department of Biochemistry and Molecular Biology, Univ of Massachusetts, Amherst|
|17||Michael Thompson||Assistant Professor||Department of Chemistry and Chemical Biology, UC Merced|
|18||Peter Kim||Professor||Department of Biochemistry, Stanford University|
|19||Adam Frost||Associate Professor||Department of Biophysics and Biochemistry, UCSF|
|20||Jamie Cate||Professor||University of California, Berkeley|
|21||Jon Marles-Wright||Senior Lecturer||Newcastle University, UK|
|23||Paul Robustelli||Assistant Professor||Dartmouth College|
|24||Marko Hyvonen||Reader in Protein Biochemistry||University of Cambridge|
|25||Daren Fearon||Beamline scientist||Diamond Light Source|
|26||Roger Shek||Research Scientist||University of Washington|
|27||Alex J. Vecchio||Assistant Professor, Department of Biochemistry||University of Nebraska-Lincoln|
|28||Edward C Twomey||Assistant Professor of Professor of Biophysics and Biophysical Chemistry||The Johns Hopkins University School of Medicine|
|29||Cameron Mackereth||Group Leader (Inserm DR2)||Inserm / ARNA Laboratory (U1212)|
|30||Tom Terwilliger||Senior Scientist||New Mexico Consortium|
|31||Michael Cianfrocco||Assistant Professor||University of Michigan|
|32||Danielle A Grotjahn||Scripps Fellow||The Scripps Research Institute|
|33||Charles Brenner||Chair, Dept of Diabetes & Cancer Metabolism||City of Hope National Medical Center|
|34||Roberto Chica||Professor||University of Ottawa|
|35||Rommie E Amaro||Professor||UC San Diego|
|36||Nicolas Lux Fawzi||Associate Professor||Brown University|
|37||Stephen Brohawn||Assistant Professor||University of California, Berkeley|
|38||Debnath Ghosal||Assistant Professor||University of Melbourne|
|40||Stephanie Wankowicz||Graduate Student||UCSF|
|41||Scott Horowitz||Assistant Professor||University of Denver|
|42||Chen Sun||Purdue University|
|43||Dmitry Lyumkis||Assistant Professor||Salk Institute for Biological Studies|
|44||Joe A Kaczmarski||Postdoctoral Researcher||Australian National University|
|45||Daniel Keedy||Assistant Professor||Structural Biology Initiative, CUNY Advanced Science Research Center|
|46||Benjamin Barad||Postdoc||Scripps Research|
|47||Douglas Kojetin||Associate Professor||Department of Integrative Structural and Computational Biology, Scripps Research|
|48||Roberto Efrain Diaz||Graduate Student||Department of Bioengineering and Therapeutic Sciences, UCSF|
|49||Gavin Knott||Research Fellow||UC Berkeley/Monash University|
|50||Faisal Koua||Deutches Elektronen Synchrotron|
|51||Wladek Minor||Professor||University of Virginia|
|52||Marcel Conrady||PhD student||University of Mainz|
|53||Chrisostomos Prodromou||Senior Lecturer||University of Sussex|
|54||Loes Kroon-Batenburg||Assistant Professor||Utrecht University|
|55||Matthew Bowler||Beamline Scientist||European Molecular Biology Laboratory|
|56||Mahesh Lingaraju||Postdoctoral researcher||Max-Planck Institute of Biochemistry|
|57||Yair Gat||Postdoc||Max Planck Institute|
|58||Doriano Lamba||Scientific Associate - Retired Fellow||Istituto di Cristallografia - Consiglio Nazionale delle Ricerche, Trieste (Italy)|
|59||Ashley Buckle||Monash University|
|60||Dhaval Patel||Assistant Professor||Institute of Advanced Research|
|61||Colin Jackson||Professor||Australian National University|
|62||Eugene Sun||Postdoc||Bristol U|
|63||Simon Fromm||Postdoc||UC Berkeley|
|64||Guillaume Gaullier||Researcher||Uppsala University|
|65||Aashish Manglik||Assistant Professor||Department of Pharmaceutical Chemistry, UCSF|
|66||Thomas Tomasiak||Assistant Professor, Department of Chemistry and Biochemistry||University of Arizona|
|67||Ivan G. Shabalin||Research Scientist||University of Virginia|
|68||Walter Chazin||Professor and Director, Center for Structural Biology||Vanderbilt University|
|69||Arun Malhotra||Associate Professor||University of Miami School of Medicine|
|70||Wolf-Dieter Schubert||Professor of Biochemistry||University of Pretoria|
|71||Simon Fromm||Postdoc||UC Berkeley|
|72||Guillaume Gaullier||Researcher||Uppsala University|
|73||Aashish Manglik||Assistant Professor||Department of Pharmaceutical Chemistry, UCSF|
|74||Thomas Tomasiak||Assistant Professor, Department of Chemistry and Biochemistry||University of Arizona|
|75||Ivan G. Shabalin||Research Scientist||University of Virginia|
|76||Walter Chazin||Professor and Director, Center for Structural Biology||Vanderbilt University|
|77||Arun Malhotra||Associate Professor||University of Miami School of Medicine|
|78||Wolf-Dieter Schubert||Professor of Biochemistry||University of Pretoria|
|79||Jyh-Yeuan (Eric) Lee||PI / Assistant Professor||University of Ottawa Faculty of Medicine|
|80||David Cooper||Instructor of Research||University of Virginia|
|81||Natalia Jura||Associate Professor||UCSF|
|83||Mark MacRae||Graduate Student||NYU School of Medicine|
|84||Bridget Carragher||Co-director, SEMC||New York Structural Biology Center|
|85||Michelle Moritz||Research Scientist||University of California, San Francisco|
|86||Somaye Badieyan||Postdoc||University of Michigan|
|87||Nigel W. Moriarty||Scientist||Lawrence Berkeley Lab|
|88||Eric R Greene||Postdoctoral Scholar||University of California San Francisco|
|89||Gerlind Sulzenbacher||Research engineer||AFMB-CNRS-AMU|
|90||Joel Tyndall||Associate Professor||University of Otago|
|91||Joost Snijder||assistant professor||Utrecht University|
|92||Karla Satchell||Professor||Northwestern University Feinberg School of Medicine|
Once you’ve signed, use the Tweet button below to share the news with your network.Tweet
Funders could also play a role in encouraging data deposition through their guidance to grantees and applicants. Preprint servers could also encourage users to share their data during the submissions process (with appropriate citation in accordance with the FORCE11 data citation principles) and encourage affiliates to check for the availability of such data during the screening process. ASAPbio will share this letter and its signatories with these entities to advance the conversation about other ways to encourage data availability.
While this letter is focused on structural data, we hope other communities will follow in their support for data sharing upon preprint deposition, particularly those with a strong culture of data sharing and established dedicated repositories, for example in relation to gene sequences (GenBank), gene expression (GEO), microscopy data (EMDB), NMR assignment (BMRB) and similar datasets.
We invite these communities to develop their own call for support for data sharing with preprints and we encourage them to contact us if they would like to pursue a similar call.
Frequently asked questions about preprints and structural biology
What is a preprint?
A preprint is a scientific manuscript that is uploaded by the authors to a public server. The preprint contains data and methods, but has not yet been accepted by a journal. While some servers perform brief quality-control inspections (for more details on the practices of individual servers, see asapbio.org/preprint-servers), the author’s manuscript is typically posted online within a day or so without peer review and can be viewed (and possibly translated, reposted, or used in other ways, depending on the license) without charge by anyone in the world. Most preprint servers support versioning, or the posting of updated versions of your paper based upon feedback and/or new data. However, most servers also retain prior preprint versions which cannot typically be removed to preserve the scholarly record. Preprints allow scientists to directly control the dissemination of their work to the world-wide scientific community.
Are preprints compatible with journals?
Yes. While both preprints and journal articles enable researchers to disseminate their findings to the research community, they are complementary in that preprints represent an opportunity to disseminate at an early stage.
In most cases, the same work posted as preprint also is submitted for peer review at a journal. Thus, preprints (rapid, but not validated through peer-review) and journal publication (slow, but providing validation using peer-review) work in parallel as a communication system for scientific research.
In many fields, the majority of journals allow submission and citation of preprints. To get a sense for preprint policies, you can check SHERPA/RoMEO, Transpose, or Wikipedia’s List of academic journals by preprint policy. However, before submitting a manuscript, always check the journal’s website for recent changes or any nuances of their policy.
How does the PDB interact with preprint servers?
PDB considers papers posted on a preprint server as publications (https://www.wwpdb.org/documentation/policy#toc_release) and will release PDB data associated with the preprint once this is posted.
Will my preprint be rejected from a preprint server if posted without PDB data?
We are not aware of preprint servers that screen on this basis at this time, but we hope that preprint servers or community projects might highlight preprints that contain complete data.
How will a preprint affect my patent application?
Preprints, like journal articles, are considered public disclosures, which can affect a patent application. If you intend to file an application to patent work disclosed in your paper, discuss the situation with your technology transfer office before posting your preprint.
Can I still link my PDB record to the journal version?
The advent of versioning in the PDB makes it possible for the authors to update their files and journal information while preserving the unique PDB identifiers. This system will ensure that the public always has the “up to date” version.
What about CASP, which relies on embargoes?
There are three easy ways to share your protein information with CASP. You can either directly submit the sequence of your protein and the related information through the web form; mark your PDB deposition as ‘CASP target’ (check box) within the PDB deposition system; or send CASP an email (casp AT predictioncenter.org). All of these steps can be done before preprint disclosure.
Should I use “REL” or “HPUB” as my author-requested status codes for PDB entries?
REL entries are released as soon as the authors have approved the processed files. Whereas HPUB (Hold until PUBlication) entries are placed on hold until publication or until one year from the date of deposition, whichever comes first. In both cases, the authors need to approve the final validate structure before release. Choosing REL will promote the closest release of the data alongside the preprint. It is also possible to proactively associate the PDB with the preprint DOI as a way to use the HPUB status code (and then subsequently create an updated version with the journal DOI).
What should I do if I read a preprint that does not include the underlying structural data?
You can contact the authors to query the availability of the dataset and encourage them to deposit and release the data to the PDB. You can also share this letter and resources with the authors and invite them to join the commitment to release their PDB file with their future preprints.
Header image: The structure of the SARS CoV 2 macrodomain bound to its substrate ADP ribose (PDB ID: 7KQP, https://www.biorxiv.org/content/10.1101/2020.11.24.393405v1.full)