Author Archives: Jessica Polka

New developments and plans for the Central Service RFA and Governing Body


Funding agencies are crucial to the development of the preprint movement. The adoption of preprints by the life sciences community has been accelerated by grant-making policies that recognize these manuscripts as a valid form of scholarly communication. Investments in preprint infrastructure, services, and technologies are thus necessary to build capacity for growth.

The latest strong support for preprints comes from the Chan Zuckerberg Initiative (CZI), which recently announced financial support for bioRxiv, the leading preprint server in the life sciences, along with resources to develop new open source software, including tools for manuscript conversion to XML. ASAPbio applauds the decision by CZI to provide funding for bioRxiv and further technological development, both of which will very positively advance the growth and readability of preprints.

ASAPbio also has advocated for funder investment in these areas. Together with a group of funders, ASAPbio developed plans for a Central Service, an aggregation site of preprint content meeting certain standards, new search tools, and software for XML document conversion. Seven applications were received on the April 30 RFA deadline. In parallel, ASAPbio also commissioned a 30 person task force to develop bylaws for a community-elected Governing Body, which have been released for public comment.

In light of the CZI/bioRxiv partnership that was announced on April 26, ASAPbio and the Funders Consortium have jointly decided to suspend the RFA and Governing Body for a four month period in order to reassess the needs of the scientific community. Since some objectives of the RFA are now being pursued by CZI/bioRxiv, we do not wish to duplicate or compete with their efforts. The need, role and mandate of any Governing Body may also require re-evaluation. During this four month period, we will gather more information from CZI/bioRxiv and engage the broad community of scientists, funders, scientific societies, and publishers to learn more about their opinions and needs. This represents an exciting opportunity to further advance scholarly communication by building upon the CZI/bioRxiv initiative and thus better serve the scientific community. Consistent with our mission, we will aim to bring together various stakeholders for conversations, identify and debate opportunities, and encourage input from open discussions with the community. We will release more information in the next few weeks regarding this planning process. Feel free to contact us with your input, suggestions or questions now or in the future.  

We appreciate the support and patience of everyone who has provided feedback on the Central Service governance and RFA process thus far, including the funders who have articulated principles for supporting preprint infrastructure, the RFA respondents who have written thoughtful and in many cases highly collaborative applications, attendees of our technical workshop and other meetings, members of our governance task force, our external reviewers, and many other individuals who have shared their feedback on our draft proposals for infrastructure and governance. We will continue to engage the broader community as we work to advance and accelerate scientific communication.

Preprint journal clubs


For authors, one of the most exciting potential benefits of preprints is the ability to attract early feedback from broad and diverse sources during the preparation of a scientific manuscript. Preprint journal clubs can provide this input – and a more meaningful review experience for their own members as well. Here are some examples and resources for setting them up.

Prachee Avasthi’s preprint journal club

Prachee Avasthi, an Assistant Professor at the University of Kansas Medical Center, runs a course for graduate students called “Analysis of Scientific Papers.” The class takes the shape of a journal club in which students learn how to critically evaluate scientific manuscripts.

What makes Prachee’s course unique is that the papers under evaluation are drawn exclusively from preprints. As she explains in the video above, this has several benefits:

  • Students’ feedback is actually useful to authors since it’s created while a manuscript is under revision, instead of after it has been published.
  • Since students are expected to share their reviews, they must pay more attention to maintaining high quality commentary and a productive tone.
  • Posting these reviews publicly helps to demonstrate the review process to other students and to scientists interested in the evolution of the paper in question.

Prachee has generously shared her syllabus and introductory slide deck, and the students’ reviews can be found on the Winnower. Other platforms, detailed further in this spreadsheet, are described in detail in this crowdsourced spreadsheet.

Platforms for preprint journal clubs

More platforms for preprint journal clubs and other venues for commentary on preprints and open peer review can be found in this spreadsheet.

Preprint Journal Clubs

Know of other preprint journal clubs? Please help us build the list above by adding to the 2nd tab of the spreadsheet here. If you are a researcher willing to provide feedback on others’ preprints, please add your information to this spreadsheet.

Whether in a course or on a blog, feedback from journal clubs can have a positive impact on authors and their science.

10 ways to support preprints (besides posting one)


Preprinting in biology is gaining steam, but the process is still far from normal: the upload rate to all preprint servers is about 1% that of PubMed. The most obvious way for individual scientists to help turn the tide is, of course, to preprint their own work. But given that it now takes longer to accumulate data for a paper, this opportunity might not come up as often as we’d like.

So, what else can we do to promote the productive use of preprints in biology?

1. Cite preprints

Many biologists, especially early career researchers, are concerned that their preprints won’t be properly acknowledged.

If you’d like to see some anecdata, here’s a word cloud generated by digitally polling the audience at the 2016 EMBO Long Term (postdoctoral) Fellowship retreat in November, 2016 (39 devices responded). The prompt was, “What is your biggest concern about preprints?”

embo-fellows-2

While we have yet to hear an example of a preprint author getting scooped, the concern remains very real. To counter this fear, we need to set an expectation that work disclosed in preprints will be cited fairly when relevant to other preprints and journal articles. A commitment to fairly cite relevant preprints was included in a draft statement from our first meeting, and it was widely endorsed.

2. Comment on a preprint

One of the greatest opportunities preprinting presents is the chance to receive more feedback on a paper. For example, Nikolai Slavov describes how thoughtful, constructive feedback helped his paper improve:

By using this feedback mechanism, we can strengthen one anothers’ science.

3. Set up email alerts

With an increasing number of preprint servers, it can be difficult to manually visit each one to stay on top of the literature. ASAPbio is working to facilitate an aggregation tool that will make this easier, but for now, there are several ways to get automatic email alerts on preprints of interest to you, including PrePubMed’s RSS tool. More details and instructions for different preprint alert options are described here.

prepubmed

4. Review a preprint in your journal club

Reviewing preprints may be even more rewarding that reviewing papers: you have the option to share your opinions with the authors, publicly or privately.

It’s a great educational experience for students, too. Prachee Avasthi at the University of Kansas Medical Center draws material for her “Analysis of Scientific Papers” course exclusively from preprint servers. She’s generously shared her syllabus and introductory slide deck, and the students’ reviews can be found on the Winnower.

See more examples of preprint journal clubs here.

5. Stickers

There are probably researchers in your department who aren’t aware that someone they know has posted a preprint. You can spark conversations around your lab or at conferences by affixing a sticker to your laptop, water bottle, or office door. See examples below for inspiration.

@mcdawg
@pollyp
@sciezgin
@BrianKelch
@clathrin
@aidarodrigo
@JonathanPDrury
@vinjlynch

Fill out this simple form to request some free stickers.

6. Add a message to your email signature

You can raise awareness about preprints with every message sent. Here’s an example:

7. Tell your preprint story

We’re collecting stories about researchers’ experience with preprints. You can tweet them @jessicapolka or using the #ASAPbio hashtag. You can also make a video (similar to Nikolai’s, above), or email jessica.polka@asapbio.org if you’d like to share something in longer written format.

8. Become an ambassador

ASAPbio ambassadors have agreed to act as local points of contact for discussions regarding preprints. They are listed on asapbio.org and have a private discussion group and access to shared presentation materials like slides and posters.

Sign up here.

9. Promote policy change

Journal, funder, and university policies are critical to make preprinting a viable options in biology. If journals with restrictive preprint policies operate in your field, you could write to editors to request that they reconsider. Requests could be made spontaneously or by using an ongoing correspondence to bring attention to the matter.

A anonymous researcher includes a statement like this one in their peer reviews:

I encourage authors to post future manuscripts to preprint servers and archive data and strains in public repositories.

If you’re on a faculty search committee, consider working to insert a call for preprints into the job ad:

We’re keeping a list of university, funder, and journal policies to provide examples of existing progressive policies that you can reference.

10. Add a slide about preprints to the end of your talkssingle-slide-png

Of course, this works best if customized to fit your own experience (eg, a screenshot of the preprint you’ve been discussing in the talk). You can download a template in pptx here.

Note: the original version of this post encouraged responses to the NIH’s RFI on preprints as item #10.

Please share more ideas below!

Update on development of a Central Service Request for Applications (RFA)


At the ASAPbio Funders’ Workshop in May of 2016, representatives of funding agencies requested that ASAPbio “develop a proposal describing the governance, infrastructure and standards desired for a preprint service that represents the views of the broadest number of stakeholders.” Toward this end, we proposed a model for a “Central Service” (CS) that would aggregate content from multiple preprint servers, facilitating human and machine access to preprints via a search tool and an API.

Three separate processes are now ongoing to define this service:

Continue reading

ASAPbio newsletter vol 5 – Tell the NIH what you think about preprints, Crossref service launches today, new resources


Dear ASAPbio subscriber,

Tell the NIH what you think about preprints

The NIH has recently released a request for information (RFI) on the use of preprints and other interim research products. We encourage all interested parties to respond to the RFI using the submission website by the deadline of December 9th (extended from November 29th).

ASAPbio’s draft response is posted here. Even if you completely agree with our draft, we encourage you to submit your own responses as well. A large number of responses will be critical in conveying a strong message of community interest in preprints and other interim research products to the NIH. Responses from individual scientists at all career stages are encouraged. You do not have to respond to all questions, and the responses can be short. If you would like to share comments or your own response to the RFI, please use the comment section below the post.

Crossref launches preprint service

Today, Crossref, the organization that assigns DOIs for journal articles, launches their preprint service! The service will offer a specialized content type for preprints, enabling them to be linked to their corresponding journal article. This development will make it easier for preprint servers and journals to display links (backwards and forwards) between different versions of the same article, and it will facilitate pooling of metrics, citations, etc between the versions. This is a landmark in the development of preprints as an integral part of the scholarly literature.

New resources at ASAPbio.org

How many life sciences preprints were posted in September 2016? Which journal now has Preprint Editors? Which funder is requiring preprint deposition? And which med school accepts preprints in tenure packages?

We’re now tracking the growth of preprints in the life sciences as well as new developments in funder, university, and journal practices and policies regarding preprints. You also can now view all of these newsletter posts (including this one) on the web. Finally, we printed stickers (below) to help create visibility and spark conversations about preprints. Just fill in the form at asapbio.org/stickers to request some!  

rect13145-6-5-6-2-6-8_desat

Best,
Jessica Polka
Director, ASAPbio

ASAPbio’s response to the NIH RFI on preprints


Note: the RFI is now closed. The NIH has announced a policy that encourages the use of preprints.

The NIH has recently released a request for information (RFI) on the use of preprints and other interim research products. We encourage all interested parties to respond to the RFI using the submission website by the deadline of December 9th 2016 (extended from November 29th).

ASAPbio’s draft response is posted below. Even if you completely agree with our draft, we encourage you to submit your own responses as well. A large number of responses will be critical in conveying a strong message of community interest in preprints and other interim research products to the NIH. Responses from individual scientists at all career stages are encouraged. You do not have to respond to all questions, and the responses can be short. If you would like to share comments or your own response to the RFI, please use the comment section below the post.
Continue reading

ASAPbio newsletter vol 4 – Technical workshop, new website features, ambassadors


Dear ASAPbio subscriber,

Here’s what’s new:

  • We held a successful Technical Workshop to discuss the feasibility of creating a central preprint service. All the notes are online, and you can also view the archived video stream.
    • We’re working on a request for information to identify potential suppliers, their implementation strategies, and their predicted costs and development timescales. We will present all reasonable responses to a group of funders as part of our response to a request that emerged from the ASAPbio Funders’ Workshop. More details about our planned process can be found here.
  • We’ve added some new features to the website.
  • The ambassador program kicked off in earnest.
    • Check out the map to see who’s near you.
    • It’s also not too late to sign up – we’re looking for people willing to act as local points of contact about preprints. We’re also providing resources to help ambassadors give talks about preprints at their home institutions or while traveling to conferences and other meetings.

Please let us know if you hear of any exciting developments in preprints in life sciences!

Jessica

Jessica Polka, PhD

Director, ASAPbio

ASAPbio newsletter vol 3 – REQUESTING FEEDBACK on a central preprint service for biology


Dear ASAPbio subscriber,

It’s been an exciting few months at ASAPbio! Here’s what’s happened:

  • The report of our February meeting at HHMI was published in Science, and Ron Vale and Tony Hyman recently published an article about priority of discovery & preprints in eLife.
  • ASAPbio was awarded grants totalling $400,000 in provisional funding from the Arnold, Sloan, Simons, and Moore foundations for a period of 18 months.
  • We held a Funders’ Workshop at the NIH on May 24th.
  • As an output of this, representatives from funding agencies called for ASAPbio to develop a proposal for a preprint service for biology.
  • In response to this request, we’re now seeking feedback from the community on a draft proposal for a central preprint service that could aggregate content from multiple servers. Please consider leaving a public comment on the web and sharing the link with your networks. After future iterations, we will present several variations to funders in the fall.
  • To develop the technical aspects of this proposal, we’re hosting a Technical Workshop in Cambridge, MA on 8/30. We’re aiming to provide a video stream so that anyone can follow along.

Finally, effective 8/1, I’m now serving as full-time director of ASAPbio! Please don’t hesitate to contact me with any comments, questions, or ideas on how we can work together to advance the productive use of preprints in biology.

Best,

Jessica Polka, PhD

Director, ASAPbio

Appendix 2: Current feedback on Central Service features


Central Service model documents

Current discussions with the community on proposed features of the Central Service

Surveys and information from scientists

We will continue to engage the scientific community on what services they want to see in a next generation of preprints. However, based upon a survey that ASAPbio conducted in May 2016 (Results summary (pdf) and Anonymized responses (xls)) and other resources (e.g. Preprint user stories compiled by Jennifer Lin at Crossref and ASAPbio survey #1 (early 2016)), we believe that biologists want:

  • High visibility and discoverability of preprints
    • A single recognized website
    • Good search tool
    • Email notifications
  • Web-readable xml format
    • Click on link to figures to display them
    • Ability to click on links to references
    • Export to more readable and compact pdfs
  • A system for cross-referencing versions of the same work
    • Linking the final journal publication to preprint versions (and vice versa), so that the history of the work is transparent and preserved

as a reader servers responsible submission process

Input from servers, publishers, funders and data management experts

In July-August 2016, ASAPbio conducted informal interviews with preprint servers, funders, scientists and developers. We originally presented a variety of Central Preprint Service models of increasing complexity and centralization, ranging from a PubMed-like metadata search tool (Model 1) to a PubMed Central-like database that hosts well-formed XML content (JATS) and makes it available through a web display tool and an API (Model 4). One version also included a central submission tool (Model 5).

5models

While responses to the creation of a central tool were generally very positive, opinions on the best implementation varied. Below is a summary of some of the critical feedback we received.

  • Models 1 & 2 provide little benefit over the current state of affairs. These models generated less interest among funding agencies.There are already multiple ways to search preprints (search.bioPreprint, PrePubMed, Google Scholar) and existing preprint servers already preserve their own content.
  • Models without an open API and common licensing will stifle innovation. Without free access to content, 3rd parties will have difficulty in implementing new services (such as peer review, data mining, or aggregation)
  • Providing central submission and full-text display would be undesirable for some existing servers. These tools would directly compete with existing servers for traffic and recognition in the community. Also, display in multiple locations could disrupt download/view metrics and commenting systems. However, some funders felt that the CS should have the ability for full display as well as drive traffic to server sites. Some funders have expressed an interest in allowing submission directly to the CS (Model 5), but most favor a practical solution that embraces the needs of the ecosystem.
  • Many of the original models are complicated and development of any system with many moving parts will take a long time. Therefore, “perfection must not be the enemy of the good.” There will be a need to generate a CS that will work “out of the box” and improve on it over time. The CS needs to take into account realistic development of technologies.
  • Technological limitations make the use of JATS impractical. No good unsupervised .doc -> JATS converters currently exist. Thus, the conversion process requires human intervention.
  • Document conversion is costly. Server-side conversion to a structured format (such as JATS) is expensive (on the order of ~$20+); therefore, it doesn’t make sense for preprint servers to provide this, especially when preprints generate no revenue. The CS should be close to cost-neutral to servers and other publishing entities.
  • Licensing has generated a diversity of opinion. Some parties favor author or publisher choice in licensing, arguing that scientists will have concerns of the re-use of their material. Our own surveys and interactions with scientists suggest that most do not understand licensing options and their associated benefits/disadvantages. Most funders favor a uniform licensing policy for the material in the CS in order to allow re-use in innovative ways and avoid complicated restrictions for data mining. The license most favored at the moment is CC-BY, although this may require research and engagement with the scientific community.
  • Servers, Platforms, Publishers consulted were generally interested in working with the CS. However, alignment and preference for models varied between model 2 and model 4.  
  • A major topic in which opinion varies is ‘display’. Some funders and publishers the CS should have capability of displaying its archived content.  Others feel that the CS should not display content to readers (other than abstracts) and that display should reside with servers and publishers.
  • Use existing technologies whenever possible. Don’t reinvent the wheel, and carefully evaluate existing software/infrastructure.
  • Balances immediate concerns against opportunities for future development. Expressed by many, this sentiment emphasizes the need for a governance body that can continuously weigh these issues over time and make adjustments. In addition, inter-operability between preprints systems in biology, physics and other disciplines may need to be considered in the future.

We have drafted a provisional model of the Central Service (Summary) that takes into account the various input received above. The model emphasizes the development of document conversion services and the provision of web-ready full-text outputs to the input server. Providing full-text display via intake server/publishers will deliver to scientists many of the benefits they want while providing intake servers with incentives to participate in the program.

Possible benefits of the proposed service

To scientists

  • Preservation
  • Ease of use and readability (through web display at intake server)
  • Adherence to standards of author identity and ethical guidelines for research and disclosure
  • Potential for innovative reuse (with appropriate attribution)

To intake servers/platforms

  • No-cost document conversion into web-readable format
  • No-cost preservation
  • Improved exposure through a search portal that links exclusively to the intake server for display
  • “Accreditation” of servers or individual preprints through central screening process

To funders

  • Uniform standards of quality
  • Access to entire corpus via API
  • Ability to search/filter by funding source

Desired technical features for discussion

We welcome comments on the list of desired features below, which could become an agenda for discussion at the Technical Workshop (August 30, 2016).

Input (collected from the author)

  • Original manuscript file (.doc so that reference metadata can be extracted)
  • Supplementary files
  • ORCID
  • License (note- the Governance Body task force will also address this issue)
  • User authentication
  • Metadata (if extracted from .doc, get the user to check)
  • Grant support
  • Ethical statements (note- a separate task will also address this issue)
    • Self-ID COI
    • All authors agree on submission
    • Methods needed to reproduce this work are contained within the work
    • The work has been conducted in agreement with human & animal research guidelines

Document conversion

  • Extraction of text from source file
  • Extraction of metadata (such as title, authors, affiliations, keywords, and abstract)
  • Extraction of references
  • Insertion of figures, or recognition of existing in-line figures

Screening and moderation

  • Automated plagiarism detection
  • Automated detection of non-scientific content (via arXiv-like algorithm)
  • Interface for human-supervised screening/curation/moderation

Versions and identifiers

  • Unique, persistent ID for each version
  • All versions linked to one another (and to published journal article)
  • Linked to datasets
  • Tombstone pages for retracted content

Archiving

  • Stable archiving of source file (.doc) and also derivatives
  • Permission to display content if intake server reaches end of life

API

  • Bulk download of all content (.doc) and also derivatives
  • Filtering by metadata

Discovery tool

  • Full-text indexing of all content in the central database
  • Advanced search (boolean operators, search fields such as author, keyword, funding support)
  • Alerts (RSS/email)
  • Display of abstracts, etc, but exclusive link to intake server for full-text display

Proposal development process

The output of the Technical Workshop will be an announced in a Request For Information (RFI), in response to which any interested party can provide information on the development and approximate costs of developing a CS. The responses to the RFI will be shared by ASAPbio with major international funding agencies for potential consortium support. Pending their collective interest in financially supporting a plan for a CS and refining its method of operation and governance, a formal RFA may follow the RFI to which interested parties could apply for funding.

decision process

Appendix 1: Rationale for a Central Service


Central Service model documents

Do we need new infrastructure and governance for preprints?

Because the preprint server arXiv was born very early in the history of the internet and served its community well, it has become the de facto repository for preprints in the physical, mathematics and computer sciences without any major competitors. During the past two decades, various scientific disciplines decided to join arXiv rather than start their own servers. Thus, arXiv has become a “central server” for the physical science community and has achieved high visibility.

The success of preprints in physics was aided by the coalescence of a large body of content in one highly visible site (arXiv) that had a high standard for quality, attracted outstanding work, and had a scientist-led governance model. Biologists could attempt to replicate this single server model. However, biologists already deposit work in several existing preprint servers, notably bioRxiv (established 2013), PeerJ Preprints (established 2013), and the q-bio section of arXiv (established 2003). In addition, PLOS has had a long-standing interest in posting pre-peer review manuscripts as an option for submitted papers, which could add considerable content in the near future. F1000 has developed a publishing platform that provides access to manuscripts before formal peer review (effectively a preprint; see definition below) and the Wellcome Trust has adopted the F1000 platform to launch Wellcome Open Research. Other journals or funding agencies may also decide to develop similar dissemination mechanisms for pre-peer review content. Thus, the concept of disseminating “pre-peer review” manuscripts is broadening beyond a traditional “arXiv”-like server.

The future of preprints in biology is now poised at both an exciting as well as fragile moment. If organized and thought-through properly, preprints could accelerate scientific communication, serve the public good, clarify priority of discovery, and help career transitions of young scientists (see Preprints for the Life Sciences, Science). The development of preprints could be governed by the scientific community, in partnership with publishers and other service providers, leading to exciting and innovative possibilities for science communication.

However, the future of preprints could be less bright. Preprints could become fragmented among the efforts of multiple competing parties, lack overall visibility and critical mass, fail to harness modern possibilities for dissemination and use, and lack clear governance. Preprints may fail to achieve the level of respectability needed to convince scientists, funders, and universities that the disclosure of work by scientists plays a valuable role in the ecosystem of science communication, along with post-peer review journal publications. If preprints fail to grow substantially in submissions and readership in the next five years (e.g. to the level of arXiv), scientists and funders will view them a failed experiment. Because the future of preprints is poised in critical time window, it is important to think through issues of execution that will maximize the chance of preprint adoption by the community.

If the scientific community does not act, continued fragmentation of preprint sites could undermine the potential of this communication system by generating:

  • Ambiguity about what qualifies as an acceptable preprint and a recognized content provider. Currently, funding agencies and universities are considering whether preprints or other “pre-peer review” publications should be included in applications. However, what is defined as a “recognized preprint server” is ambiguous at the moment. Every server or publisher may define their own screening protocol, causing uncertainty about whether a preprint has been screened for plagiarism or adheres to ethical standards. In this current system, each journal, funding agency, and hiring or promoting committee must define a list of approved preprint sources based on their own assessments of preprint servers. This practice, which is already occurring at certain journals, will create a situation that is confusing and discouraging for researchers.
  • Lack of visibility and difficulty of discovery. If preprints are spread across multiple sites, they will become more difficult to find. Maximizing discoverability, visibility, and respectability are key to adoption and widespread use by scientists, as is suggested by the success of arXiv.
  • Variable and potentially limited access to data. In the current system, each server sets its own licensing policies and is responsible for archiving its own content. This puts content in danger of being held under restrictive licenses or lost altogether.

Limited potential for technology development. If each server must create or outsource IT infrastructure, overall costs of the preprint system will be high, many servers will not have funds for more advanced IT development, and the potential for using and disseminating information may be limited.

Value of a Central Service

To overcome the deficiencies described above, we believe it would be in the best interest of the scientific community to create a Central Service (name subject to change at a later date) that will aggregate “pre-peer review” manuscripts from several sources, maintain standards of quality for its intake, preserve content for posterity, and disseminate information in a manner that advances scientific progress. The Central Preprint Service would, in essence, function as a database that serves the public good, analogous to the Protein Data Bank or Pubmed Central. We envision that a Central Preprint Service will be supported by a consortium of funding agencies for a minimum five year term of operation. It wil be overseen by governance body that will be 1) international, 2) led by highly respected members of the scientific community, and 3) transparent in all of its proceedings, actions, and recommendations.

Partnerships with journals and servers

The Central Service will host manuscripts that contain 1) data, 2) the methods needed by other scientists to replicate that data, and 3) an interpretation of that data. The governing body will determine how manuscripts are screened for entry into the Service (for example, to exclude content that is plagiarized, non-scientific, or in violation of ethical guidelines). However, the Service will not engage in validation or judgment of the work as is performed by traditional peer review. Thus, the Central Service will work as a partner, and not a competitor, with existing journals.

The Service also seeks to act as a partner with preprint servers and publishers that ingest manuscripts from authors. Partners who can deposit their content into the Central Preprint Service will benefit from additional infrastructure support (e.g. plagiarism detection, conversion tools, etc) and most importantly will have greater appeal to scientists who will want their preprint broadly viewed and recognized by grant and promotion committees.

Creation of a Central Preprint Service for the Life Sciences


ASAPbio is iteratively seeking community feedback on a draft model for a Central Preprint Service. We will integrate community and stakeholder feedback into a proposal, containing several model variants, to funders this fall. Please leave your feedback on utility of the Central Service, its features, and the model described in the Summary in the comment section at the bottom of the page, or email it privately to jessica.polka at gmail.com. More comments are posted on hypothes.is (follow this link and expand the menu at right)

Central Service model documents

Summary

At the ASAPbio Funders’ Workshop (May 24, 2016, NIH), representatives from 16 funding agencies requested that ASAPbio “develop a proposal describing the governance, infrastructure and standards desired for a preprint service that represents the views of the broadest number of stakeholders.” We are now holding a Technical Workshop to advise on the infrastructure and standards for a Central Service (CS) for preprints. ASAPbio will integrate the output of the meeting and community and stakeholder feedback into a proposal to funding agencies this fall. The funders may issue a formal RFA to which any interested parties could apply for funding. More details on this process are found at the end of Appendix 2.

Background

The preprint ecosystem in biology is already diverse; major players include bioRxiv, PeerJ Preprints, the q-bio section of arXiv, and others. In addition, platforms such as F1000Research and Wellcome Open Research are producing increasing volumes of pre-peer reviewed content. PLOS has a stated commitment to exploring posting of manuscripts before peer review, and other services may be developed in the future.

Increasing the number of intake mechanisms for the dissemination of pre-peer reviewed manuscripts has several advantages, for example: 1) generating more choices for scientists, 2) promoting innovative author services, and 3) increasing the overall volume of manuscripts, thus helping to establish a system of scientist-driven disclosure of their research. However, an increasing number of intake mechanisms also may lead to confusion and difficulty in finding preprints, heterogenous standards of ethical disclosure, duplication of effort in creation of infrastructure, and uncertainty of long-term preservation. (See a more complete discussion of why we think it is essential to aggregate content in Appendix 1.)

Based upon funder interest from the May 24th Workshop, ASAPbio will propose that funding agencies support the creation of a Central Service (CS) that will aggregate preprint content from multiple entities. This service will have features of PubMed (indexing/search) and PubMed Central (collection, storage, and output of manuscripts and other data).

The advantages of this system for the scientific community would be:

  1. Oversight by a Governance Body. The content, performances, and services of the CS would be overseen by a Governance Body composed of highly respected scientists and technical experts. The formation of Governance Body, which will have international representation and be transparent in its operation, will be addressed by a separate ASAPbio task force and will not be discussed in the Technical Workshop. The connection between the CS and a community-led Governance Body will ensure that preprints continue to serve the public good and develop in ways that benefit the scientific community, beyond the needs of individual publishers and servers. This formation of a central, well-functioning Governance Body has been repeatedly described by funders and scientists as an essential element in gaining respectability for preprints and guiding the system in the future.
  2. Guaranteed stable preservation. Archiving content through a CS better assures permanence of the scientific record, even if a preprint server/publisher decides to discontinue their services.This is a key feature for both scientists and funders.
  3. Greater discoverability and visibility for scientists. The CS would become the location for scientists to search for all new pre-peer reviewed content. Lessons from arXiv indicate that a highly visible, highly respected single site for searching for new findings is essential for the scientific community.
  4. Clarity on what qualifies as a respected preprint. Scientists want their preprint to “count” for hiring, promotions, and grant applications. However, universities and funding agencies are concerned about quality control for preprints and how they can guide their scientists and reviewers on what qualifies as a credible preprint or preprint server. The CS/Governance Body will work with universities and funders to apply uniform standards of author identity, checks for plagiarism, moderation of problems, and create ethical guidelines for research and disclosure. Thus, content on the CS, coming from several sources, will meet uniform guidelines acceptable to funders and universities.
  5. Better services for scientists. Scientists, as consumers, want better ways of viewing content. They want to read manuscripts in an xml format on the web or as a PDF download, more easily link to references, and more easily view figures and movies. The CS would perform document conversion to ease viewing and searching for material, thereby accelerating new discoveries. The CS would have an API to enable innovative reuse by other parties to provides services that could be valuable for scientists beyond the scope of the CS (e.g. evaluations of work, journal clubs, additional search engines).
  6. Reduced overall cost. The central service can efficiently provide services (such as archiving, automated screening, and document conversion) that otherwise would be provided redundantly by each intake server/publisher.

We discussed various models for the CS with stakeholders (see Appendix 2 for types of models and the feedback that we received). This document describes the current iteration of the model, which is still in draft form. We will present several variations to funders this fall, based on feedback received, including the comments here. If you prefer, you may email comments privately to jessica.polka at gmail.com.

The CS would undertake several functions including centralized document conversion, accrediting (via setting guidelines for intake), archiving, search, and an API for third-party use. We are currently considering that the CS would not display full-text, but instead would send back the converted full-text to the intake server for display.

In this draft model:

  • Servers would facilitate the submission of a .doc or .tex file and a standardized set of metadata (e.g. authors names, potentially ORCID numbers, etc) to the CS. From this file, the CS could extract an html or xml file (possibly including links to references, figures, etc).
  • If this file passes CS screening (including plagiarism detection, and potentially human moderation etc), it would be admitted into the central database, assigned a unique ID, and be sent back to the intake provider for display.
  • The CS would archive the original .doc file and other associated files, and also make these available via an API; as reference extraction technology improves, etc, new html/xml derivatives can be prepared. The CS would reserve the right to display content if the intake provider is not able to do so or if required by the funders or governance body.
  • Readers could search for preprints (or receive alerts) through CS-hosted tools that would display metadata (including abstracts); readers would be sent to the intake server for full-text display of preprints.
  • All aspects of the central service would be under the control of a governing body, which would have international representation from the scientific community and could develop over time.

The Technical Workshop will discuss the features, mechanisms, existing infrastructure, potential concerns and challenges, and timelines for implementation for the elements in orange on the diagram below. 

CS model v2

(previous version)

ASAPbio will continue to modify the model before and after the Technical Workshop before presenting several variations to funders in the fall.

Below: possible early-stage implementation

CS model v2 initial

(previous version)

Four foundations announce support for ASAPbio


This announcement was originally posted on the Simons Foundation website.

On June 20, four foundations announced their support for ASAPbio (Accelerating Science and Publication in Biology), a scientist-driven effort with a mission to promote the use of preprints in the life sciences. The combined total provisional funding — from the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, the Laura and John Arnold Foundation and the Simons Foundation — is $400,000 for work to be conducted over the next 18 months.

The hope is that use of preprints will catalyze scientific discovery, facilitate career advancement and improve the culture of communication within the biology community. Continue reading

Vale & Hyman publish eLife article on preprints & priority


Tony Hyman and ASAPbio founder Ron Vale have just published a Point of View in eLife building on their earlier blog post.

ABSTRACT: The job of a scientist is to make a discovery and then communicate this new knowledge to others. For a scientist to be successful, he or she needs to be able to claim credit or priority for discoveries throughout their career. However, despite being fundamental to the reward system of science, the principles for establishing the “priority of discovery” are rarely discussed. Here we break down priority into two steps: disclosure, in which the discovery is released to the world-wide community; and validation, in which other scientists assess the accuracy, quality and importance of the work. Currently, in biology, disclosure and an initial validation are combined in a journal publication. Here, we discuss the advantages of separating these steps into disclosure via a preprint, and validation via a combination of peer review at a journal and additional evaluation by the wider scientific community.

Summary of the ASAPbio Funders’ Workshop


The following is a message from funding agency representatives who attended our recent Funders’ Workshop.

As research funders who attended the ASAPbio Funder’s Workshop for Preprints held at the National Institutes of Health (NIH) on May 23-24, 2016, we wish to provide a brief summary of the meeting. This follows the initial Funder’s Perspective drawn from the first ASAPbio Workshop held on February 16-17, 2016, and continues our desire to be transparent while the community continues to explore the value of preprints to the biomedical research enterprise.

At this workshop, the funders were presented with a summary from the first workshop and the results of a survey conducted by ASAPbio. This was followed by an open discussion of the scholarly and technical goals of a preprint service. The agenda then moved to a discussion of two exemplary models of shared governance of a resource in an international setting, Europe PubMedCentral (Europe PMC) and the Worldwide Protein Data Bank (wwPDB). The final context setting for the funders discussion was provided by representatives of existing and anticipated preprint services,  ArXiv, bioRxiv, PeerJ, F1000 Research, and PLOS. What followed was an open session with all stakeholders present and a closed session involving only the funders.

The consensus of the workshop attendees reflected high enthusiasm about further development of a preprint service for the life sciences. At the end of the day, it was agreed by all in attendance that:

  1.      A preprint policy that is as homogeneous as possible across funders is desired, especially in the way that preprints are considered as part of proposal grant submission and review. A subgroup of funders will draft a concept paper addressing some of the policy issues that might arise when implementing such a preprint policy. This draft will be shared with other funders for their input.
  2.      The funders asked ASAPbio to develop a proposal describing the governance, infrastructure and standards desired for a preprint service that represents the views of the broadest number of stakeholders. The proposal should include a budget, goals, milestones and implementation timeline to bring an appropriate community defined preprint service into operation.
  3.      This letter be distributed as widely as possible to inform all stakeholders of the continued interest by funders in expanding the use of preprints by the life sciences community.

Philip Bourne, The National Institutes of Health
Maryrose Franko, Health Research Alliance
Michele Garfinkel, European Molecular Biology Organization
Judith Glaven, Howard Hughes Medical Institute
Eric Green, The National Institutes of Health
Josh Greenberg, The Alfred P Sloan Foundation
Jennifer Hansen, Bill and Melinda Gates Foundation
Robert Kiley, The Wellcome Trust
Cecy Marden, The Wellcome Trust
Paul Lasko, Canadian Institutes of Health Research
Maria Leptin, European Molecular Biology Organization
Tony Peatfield, Medical Research Council, UK
Brooke Rosenzweig, The Helmsley Trust
Jane Silverthorne, The National Science Foundation
John Spiro, The Simons Foundation
Michael Stebbins, The Arnold Foundation
Nils Stenseth, European Research Council
Carly Strasser, Gordon and Betty Moore Foundation
Neil Thakur, The National Institutes of Health
K. VijayRaghavan, Department of Biotechnology, India

CC-BY-SA Thomas Ulrich, Flickr

Moore Foundation requests grantee feedback on preprint policy


The Data-Driven Discovery group at the Gordon and Betty Moore Foundation released a post on Medium today soliciting feedback on proposed changes to their policies on a variety of open access practices. Preprints are discussed as follows:

Ideally, all journal articles would first be available as preprints. Preprints are versions of your manuscript that are not yet peer reviewed. Many journals allow you to submit articles that have been available as preprints (see this listfor more information). Read more about the benefits of preprints here. Typical places where preprints are deposited for free (read more from Jabberwocky Ecology blog):

  • arXiv (for physics, mathematics, computer science, quantitative biology)
  • bioRxiv (for any biology research)
  • PeerJ Preprints (for biology, medical/health sciences, computer sciences)
  • figshare (for any research)

You can read more and provide input at the post.

Image CC-BY-SA Thomas Ulrich, Flickr

Simons Foundation supports preprints in grants


On May 20, 2016, a Simons Foundation initiative, SFARI, announced that it has changed its policies to support and encourage the use of preprints.

The Simons Foundation Autism Research Initiative (SFARI) recently made two important changes that we hope will help to accelerate the pace of autism research. First, we changed our grant award letter to strongly encourage all SFARI Investigators to post preprints on recognized servers in parallel with (or even before) submission to a peer-reviewed journal. Second, our biosketch form was updated to include space for SFARI grant applicants to list manuscripts deposited in preprint servers; we and our outside peer reviewers will take these manuscripts into consideration when making funding decisions.

Read more on the SFARI website here.

ASAPbio attendees’ commentary in Science


A group of attendees of ASAPbio have published a commentary in the “Policy Forum” section of the journal Science on May 20, 2016. Written by scientists and representatives from journals and funding agencies, the paper serves as a meeting report and summary of opinions on the use of preprints in the life sciences.

Correction: This paper contains a sentence stating that “the median review time at journals has grown from 85 days to >150 days during the past decade.” This is true of Nature, but not journals as a whole. Daniel Himmelstein’s analysis shows that delays across all journals have remained stable.

Photo by N. Cary/Science

Document 4: What does IT infrastructure for a next generation preprint service look like?


Authored by Jo McEntyre and Phil Bourne

Goal: To satisfy the fundamental requirements of establishing scientific priority rapidly and cheaply through providing the ability to publish and access open preprints, balanced with the desire to support open science innovation around publishing workflows.

Approach: An internationally supported, open, archive (or platform) for preprints as infrastructure is ideal because (a) should the use of preprints become widespread, there is potential to reap long-term open science benefits, as is the case for public data resources and (b) some core functions only need to be done once not over and over (think: CrossRef, ORCID, INSDC, PDB, PMC/EuropePMC). Ideally this would involve working with existing preprint servers to provide a core platform and archival support.

Some assumptions

  • No point-of-service cost to post a preprint for the author.
  • Licenses that support reuse (ie CC-BY) of posted articles.
  • Preprints will be citable (have DOIs).
  • Should be embedded with related infrastructures such as Europe PMC/PMC, ORCID, CrossRef and public data resources.
  • Reuse and integration as core values – by various stakeholder groups including publishers, algorithm developers, text miners, other service providers
  • Standard implementations of key requirements across multiple stakeholders e.g. version control, events notification (such as publication in a journal or preprint citation), article format standards (JATS)
  • All preprints basically discoverable and minable through a single search portal.
  • Transparent reporting/management builds trust and authority around priority
  • International and representative governance
  • Metrics to provide data on meaningful use of the content
  • Tools to manage submissions e.g. triage, communication etc. in keeping with existing manuscript submission systems
  • Public commentary on submissions
  • Linkage with final published version of the article (when it exists)

Data Ingest

The preprint server should be considered an active archive. This means that all content can be accessed at any time and certain core services are provided to enable access by both people and machines.

  1. Basic submission support and support for standard automated screening.
  2. Possible limited branding on submission portals.
  3. Competition on screening methods, or other author services by existing preprint servers (or others) is possible.
  4. Advantages: simplified content flow, standards implementation, content in one place for future use.

Basic Services

  • A stand-alone archive. Initial submission needs to be very quick for author: ie basic metadata plus files establishes priority.
  • Files rapidly published as PDFs with DOI and posted after screening/author services.
  • Ingest mechanisms could be diversified through existing preprint servers – but always some basic [automated] criteria would need to be met (automated to retain speed). For example it could e.g. require an ORCID for the submitting author as a simple trust mechanism, with further validation against grant IDs. Algorithms working on content (plagiarism, detection of poor animal welfare, scope, obscenity) could operate. There is scope for automated screening to be phased in and improved over time.
  • This model provides the opportunity for innovation around screening algorithms by the platform as well as third parties. It also provides business opportunities around author services.
  • Importantly, it also provides opportunities for innovation around coordinated submission for other materials relevant to the article, for example, data or software. But any integration of this nature would need to be lightweight for submitting authors as the speed of publication is a non negotiable feature of the preprint service.
  • Core version management would be required, both regarding new versions of the same article and linking with any future published versions of the article in journals.

Authenticated Content

  • After basic services get content in and published, the preprint service could generate JATS XML for authenticated content. Authenticated content could be defined by a number of criteria but could be e.g. PIs funded by funding organisations that support the infrastructure, popular preprints.
  • There is a cost to generating JATS XML. Limiting this added value to authenticated content could help control costs and give some confidence around that content for promoting discoverability via existing infrastructures.
  • Conversion to JATS XML will take some time, and would require input from the submitter to sign off on the resulting converted article. However it has the bonus of being integrity checked (e.g. all the figures are present), available for deep indexing, integration, and more widely discoverable via Europe PMC/PMC. Wider discoverability could be an incentive to authors to take the modest amount of extra time required to provide this data quality.
  • Note this could be an ingest point into the archive for XML content from other services/platforms.
  • In the future this more rigorous treatment may be extended to basic services, as work on methods that directly convert Word to Scholarly HTML and JATS XML mature, improve, and costs lower. However it is likely that publishing speed will be an issue for some time and a degree of submittor involvement will always be required.
  • The availability of content in JATS XML provides many opportunity for innovation around the provision of more structure in articles for integration purposes (e.g. tagging reagents, data citations and other deep linking mechanisms).

Post publication screening, filtering and sorting on the preprint platform

  • All content would be available for post-publication human screening such as user reporting of problematic content, commenting and so on.
  • More sophisticated algorithms that rank, sort and filter search results based on trust, content or other criteria could be developed by the platform and most importantly, by 3rd parties.

Data Out

  • All content available for bulk download (PDFs and XML), and via APIs as well as through website search & browse
  • Authenticated content could be made available via established archives (e.g. Europe PMC (PMC)) as a clear subset.
  • Core services managed centrally for example, content sharing with journals (this could be in collaboration with e.g. CrossRef since they already have some infrastructure around this)
  • There are possibilities for sharing article XML across publication workflows, comments/reviews, with journals/other platforms thus saving processing costs.
  • There are countless opportunities to support further innovation on the content by both commercial and academic parties with an open platform approach.

Document 6: Additional Questions for Possible Consideration


Drafted by ASAPbio

How can funders help to validate preprints as a mechanism for communication?

In a Commentary in Science published on May 20, 2016, co-authors representing several funding agencies recommended:

1) Publishing an explicit statement encouraging researchers to make early versions of their manuscripts available through acceptable preprint repositories.

2) Permitting the citation of preprints in acceptable repositories in grant proposals as evidence of productivity, research progress and/or preliminary work.

3) Providing guidance to reviewers on how to assess preprints in grant proposals.

How do funders envision taking these recommendations forward within their own agencies? Can ASAPbio assist in those efforts by working with scientific societies, institutions, journals, and advocacy groups?

Special considerations for human research?

Are there special limitations or concerns regarding preprints and human research that should be taken into consideration for a funder-supported core preprint service?

Gathering data on preprint usage?

Currently, the effectiveness and potential pitfalls in how we communicate and evaluate scientific findings are mostly opinion rather than data-driven. Might funders wish to gather data concerning preprint servers (or compare work going to preprint servers and journals)?   Do preprints servers facilitate the transmission of irreproducible work?  Or do preprints reduce the appearance of problematic journal publications? Do scientist submit lower quality work to preprint servers or is the content similar to journal submission?  Is transmission of “pseudo-science” a problem in reality? Do grant committees find preprints useful or burdensome? Are there additional questions that could be informed by data?

Managing Quality Control?

What kind of quality control would funders like to see (ie, preventing pseudoscience)? How possible is it to ensure uniform quality control on multiple servers? Must all screening be done manually by members of the community, or could algorithms be useful as a mechanism of prioritization of human screening (based upon ORCID numbers, grant support, prior publications, etc.)? Do funders want additional quality control provisions (e.g. e-signing by the submitting author of agreements of authorship and ethical standards of data gathering?).  Would acknowledgment and linkages of submitted work to grant support help to solidify the credibility of submitted work?  Should the service remove or flag a preprint that had been shown through subsequent review to contain incorrect or falsified data? How important is QC for preprints?  Which features of QC should to be implemented now, and which could be approached in the future?