#ASAPpdb: Structural biologists commit to releasing data with preprints

The Protein Data Bank (PDB) was established as the first open access repository for biological data, and the datasets it hosts have been invaluable to research in fundamental biology and the understanding of health and disease. Just this month, we witnessed the announcement of the AlphaFold2 results toward structure prediction, made possible thanks to the more than 170,000 freely accessible structures in the PDB which provided “training data” for the structure prediction software.

It was not always the case that such structural biology data were freely available, even upon journal publication. From the founding of the PDB in 1971 until the late 1980s, most journals did not require deposition of structures in a public database. A key moment was a petition, circulated in 1987 by a group of leading structural biologists, demanding that the data created be made openly available upon journal publication. This petition led to major journals adopting data deposition standards. In the early 1990s, the National Institute of General Medical Sciences (NIGMS) imposed similar requirements on all grantees.

The revolution in publishing made possible by preprints calls for a re-evaluation of data disclosure practices in structural biology. While journal review processes take weeks, months, or even years, preprints allow researchers to rapidly communicate their findings to the community. However, withholding access to PDB files that accompany preprints inhibits the progress towards scientific discovery which preprints can enable.

Commitment

We pledge to publicly release our PDB files (and associated structure factor, restraint, and map files) with deposition of our preprints.

We encourage all structural biologists to also deposit raw data in appropriate resources (e.g. EMPIAR, proteindiffraction.org, https://data.sbgrid.org/, etc).

ASAPpdb Signatories

Signatory	Name	Position	Institution
1	James Fraser	Professor	Department of Bioengineering and Therapeutic Sciences, UCSF
2	Cynthia Wolberger	Professor of Biophysics and Biophysical Chemistry	The Johns Hopkins University
3	Phil Bourne	Dean & Professor	School of Data Science & Department of Biomedical Engineering, UVA
4	Aled Edwards	Director and Chief Executive	Structural Genomics Consortium
5	David Agard	Professor	Department of Biophysics and Biochemistry, UCSF
6	Gira Bhabha	Assistant Professor	Skirball Institute, NYU
7	Damian Ekiert	Assistant Professor	Skirball Institute, NYU
8	Kliment Verba	QBI Fellow	Department of Pharmaceutical Chemistry, UCSF
9	Brian Kelch	Associate Professor	Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School
10	Gabriel Lander	Professor	Department of Structural and Computational Bio, Scripps Research
11	Jinrong Min	PI	Structural Genomics Consortium, University of Toronto
12	Robert Stroud	Professor	Department of Biophysics and Biochemistry, UCSF
13	Seemay Chou	Assistant Professor	Department of Biophysics and Biochemistry, UCSF
14	Frank von Delft	Professor of Structural Chemical Biology	University of Oxford and Diamond Light Source
15	Chun Tang	Professor of Biophysical Chemistry	Peking University
16	Margaret Stratton	Assistant Professor	Department of Biochemistry and Molecular Biology, Univ of Massachusetts, Amherst
17	Michael Thompson	Assistant Professor	Department of Chemistry and Chemical Biology, UC Merced
18	Peter Kim	Professor	Department of Biochemistry, Stanford University
19	Adam Frost	Associate Professor	Department of Biophysics and Biochemistry, UCSF
20	Jamie Cate	Professor	University of California, Berkeley
21	Jon Marles-Wright	Senior Lecturer	Newcastle University, UK
22	Iris Young	Postdoc	UCSF
23	Paul Robustelli	Assistant Professor	Dartmouth College
24	Marko Hyvonen	Reader in Protein Biochemistry	University of Cambridge
25	Daren Fearon	Beamline scientist	Diamond Light Source
26	Roger Shek	Research Scientist	University of Washington
27	Alex J. Vecchio	Assistant Professor, Department of Biochemistry	University of Nebraska-Lincoln
28	Edward C Twomey	Assistant Professor of Professor of Biophysics and Biophysical Chemistry	The Johns Hopkins University School of Medicine
29	Cameron Mackereth	Group Leader (Inserm DR2)	Inserm / ARNA Laboratory (U1212)
30	Tom Terwilliger	Senior Scientist	New Mexico Consortium
31	Michael Cianfrocco	Assistant Professor	University of Michigan
32	Danielle A Grotjahn	Scripps Fellow	The Scripps Research Institute
33	Charles Brenner	Chair, Dept of Diabetes & Cancer Metabolism	City of Hope National Medical Center
34	Roberto Chica	Professor	University of Ottawa
35	Rommie E Amaro	Professor	UC San Diego
36	Nicolas Lux Fawzi	Associate Professor	Brown University
37	Stephen Brohawn	Assistant Professor	University of California, Berkeley
38	Debnath Ghosal	Assistant Professor	University of Melbourne
39	Kate Kim	Postdoc	UCSF
40	Stephanie Wankowicz	Graduate Student	UCSF
41	Scott Horowitz	Assistant Professor	University of Denver
42	Chen Sun		Purdue University
43	Dmitry Lyumkis	Assistant Professor	Salk Institute for Biological Studies
44	Joe A Kaczmarski	Postdoctoral Researcher	Australian National University
45	Daniel Keedy	Assistant Professor	Structural Biology Initiative, CUNY Advanced Science Research Center
46	Benjamin Barad	Postdoc	Scripps Research
47	Douglas Kojetin	Associate Professor	Department of Integrative Structural and Computational Biology, Scripps Research
48	Roberto Efrain Diaz	Graduate Student	Department of Bioengineering and Therapeutic Sciences, UCSF
49	Gavin Knott	Research Fellow	UC Berkeley/Monash University
50	Faisal Koua		Deutches Elektronen Synchrotron
51	Wladek Minor	Professor	University of Virginia
52	Marcel Conrady	PhD student	University of Mainz
53	Chrisostomos Prodromou	Senior Lecturer	University of Sussex
54	Loes Kroon-Batenburg	Assistant Professor	Utrecht University
55	Matthew Bowler	Beamline Scientist	European Molecular Biology Laboratory
56	Mahesh Lingaraju	Postdoctoral researcher	Max-Planck Institute of Biochemistry
57	Yair Gat	Postdoc	Max Planck Institute
58	Doriano Lamba	Scientific Associate - Retired Fellow	Istituto di Cristallografia - Consiglio Nazionale delle Ricerche, Trieste (Italy)
59	Ashley Buckle		Monash University
60	Dhaval Patel	Assistant Professor	Institute of Advanced Research
61	Colin Jackson	Professor	Australian National University
62	Eugene Sun	Postdoc	Bristol U
63	Simon Fromm	Postdoc	UC Berkeley
64	Guillaume Gaullier	Researcher	Uppsala University
65	Aashish Manglik	Assistant Professor	Department of Pharmaceutical Chemistry, UCSF
66	Thomas Tomasiak	Assistant Professor, Department of Chemistry and Biochemistry	University of Arizona
67	Ivan G. Shabalin	Research Scientist	University of Virginia
68	Walter Chazin	Professor and Director, Center for Structural Biology	Vanderbilt University
69	Arun Malhotra	Associate Professor	University of Miami School of Medicine
70	Wolf-Dieter Schubert	Professor of Biochemistry	University of Pretoria
71	Simon Fromm	Postdoc	UC Berkeley
72	Guillaume Gaullier	Researcher	Uppsala University
73	Aashish Manglik	Assistant Professor	Department of Pharmaceutical Chemistry, UCSF
74	Thomas Tomasiak	Assistant Professor, Department of Chemistry and Biochemistry	University of Arizona
75	Ivan G. Shabalin	Research Scientist	University of Virginia
76	Walter Chazin	Professor and Director, Center for Structural Biology	Vanderbilt University
77	Arun Malhotra	Associate Professor	University of Miami School of Medicine
78	Wolf-Dieter Schubert	Professor of Biochemistry	University of Pretoria
79	Jyh-Yeuan (Eric) Lee	PI / Assistant Professor	University of Ottawa Faculty of Medicine
80	David Cooper	Instructor of Research	University of Virginia
81	Natalia Jura	Associate Professor	UCSF
82	Pavel Afonine	Scientist	LBNL
83	Mark MacRae	Graduate Student	NYU School of Medicine
84	Bridget Carragher	Co-director, SEMC	New York Structural Biology Center
85	Michelle Moritz	Research Scientist	University of California, San Francisco
86	Somaye Badieyan	Postdoc	University of Michigan
87	Nigel W. Moriarty	Scientist	Lawrence Berkeley Lab
88	Eric R Greene	Postdoctoral Scholar	University of California San Francisco
89	Gerlind Sulzenbacher	Research engineer	AFMB-CNRS-AMU
90	Joel Tyndall	Associate Professor	University of Otago
91	Joost Snijder	assistant professor	Utrecht University
92	Karla Satchell	Professor	Northwestern University Feinberg School of Medicine
93	Tushar R.	PhD	University of Hamburg
94	Dr. Shailendra Shivaji Gurav	Associate Professor	Goa College of Pharmacy, Panaji, Goa University, Goa, India- 403 001
95	Blake Riley	Postdoc	Structural Biology Initiative, CUNY Advanced Science Research Center
96	Frank von Delft	Principal Beamline Scientist	Diamond Light Source
Signatory	Name	Position	Institution

Once you’ve signed, use the Tweet button below to share the news with your network.

Next steps

Funders could also play a role in encouraging data deposition through their guidance to grantees and applicants. Preprint servers could also encourage users to share their data during the submissions process (with appropriate citation in accordance with the FORCE11 data citation principles) and encourage affiliates to check for the availability of such data during the screening process. ASAPbio will share this letter and its signatories with these entities to advance the conversation about other ways to encourage data availability.

While this letter is focused on structural data, we hope other communities will follow in their support for data sharing upon preprint deposition, particularly those with a strong culture of data sharing and established dedicated repositories, for example in relation to gene sequences (GenBank), gene expression (GEO), microscopy data (EMDB), NMR assignment (BMRB) and similar datasets.

We invite these communities to develop their own call for support for data sharing with preprints and we encourage them to contact us if they would like to pursue a similar call.

Frequently asked questions about preprints and structural biology

For more information about preprints, including additional FAQ, check the info center.

What is a preprint?

A preprint is a scientific manuscript that is uploaded by the authors to a public server. The preprint contains data and methods, but has not yet been accepted by a journal. While some servers perform brief quality-control inspections (for more details on the practices of individual servers, see asapbio.org/preprint-servers), the author’s manuscript is typically posted online within a day or so without peer review and can be viewed (and possibly translated, reposted, or used in other ways, depending on the license) without charge by anyone in the world. Most preprint servers support versioning, or the posting of updated versions of your paper based upon feedback and/or new data. However, most servers also retain prior preprint versions which cannot typically be removed to preserve the scholarly record. Preprints allow scientists to directly control the dissemination of their work to the world-wide scientific community.

Are preprints compatible with journals?

Yes. While both preprints and journal articles enable researchers to disseminate their findings to the research community, they are complementary in that preprints represent an opportunity to disseminate at an early stage.

In most cases, the same work posted as preprint also is submitted for peer review at a journal. Thus, preprints (rapid, but not validated through peer-review) and journal publication (slow, but providing validation using peer-review) work in parallel as a communication system for scientific research.

In many fields, the majority of journals allow submission and citation of preprints. To get a sense for preprint policies, you can check SHERPA/RoMEO, Transpose, or Wikipedia’s List of academic journals by preprint policy. However, before submitting a manuscript, always check the journal’s website for recent changes or any nuances of their policy.

How does the PDB interact with preprint servers?

PDB considers papers posted on a preprint server as publications (https://www.wwpdb.org/documentation/policy#toc_release) and will release PDB data associated with the preprint once this is posted.

Will my preprint be rejected from a preprint server if posted without PDB data?

We are not aware of preprint servers that screen on this basis at this time, but we hope that preprint servers or community projects might highlight preprints that contain complete data.

How will a preprint affect my patent application?

Preprints, like journal articles, are considered public disclosures, which can affect a patent application. If you intend to file an application to patent work disclosed in your paper, discuss the situation with your technology transfer office before posting your preprint.

Can I still link my PDB record to the journal version?

The advent of versioning in the PDB makes it possible for the authors to update their files and journal information while preserving the unique PDB identifiers. This system will ensure that the public always has the “up to date” version.

What about CASP, which relies on embargoes?

There are three easy ways to share your protein information with CASP. You can either directly submit the sequence of your protein and the related information through the web form; mark your PDB deposition as ‘CASP target’ (check box) within the PDB deposition system; or send CASP an email (casp AT predictioncenter.org). All of these steps can be done before preprint disclosure.

Should I use “REL” or “HPUB” as my author-requested status codes for PDB entries?

REL entries are released as soon as the authors have approved the processed files. Whereas HPUB (Hold until PUBlication) entries are placed on hold until publication or until one year from the date of deposition, whichever comes first. In both cases, the authors need to approve the final validate structure before release. Choosing REL will promote the closest release of the data alongside the preprint. It is also possible to proactively associate the PDB with the preprint DOI as a way to use the HPUB status code (and then subsequently create an updated version with the journal DOI).

What should I do if I read a preprint that does not include the underlying structural data?

You can contact the authors to query the availability of the dataset and encourage them to deposit and release the data to the PDB. You can also share this letter and resources with the authors and invite them to join the commitment to release their PDB file with their future preprints.

Header image: The structure of the SARS CoV 2 macrodomain bound to its substrate ADP ribose (PDB ID: 7KQP, https://www.biorxiv.org/content/10.1101/2020.11.24.393405v1.full)