Thomas Lemberger, EMBO
Website or social media links
Current stage of development
EEB is an experimental platform under development and used as sandbox to test ideas about aggregation and mining of refereed preprints.
How has your project changed?
In view of the feedback received we have decided to fuse our two proposals “Early Evidence Base…” and “Towards Principled Metrics…” into a single project. We feel that presenting the Early Evidence Base (EEB) as a single resource that combines aggregation of refereed preprints, rendering and summarization of peer reviews and automatic mining of the scientific content of preprints will provide a more concrete view of our ideas on how to increase engagement of authors, readers and reviewers with refereed preprints.
Have you integrated any feedback received?
- One point of discussion was whether it was premature to build advanced platforms such as EEB when the number of peer reviewed preprints remains low. In our view, it is key to increase engagement and trust not only of reviewers but above all of authors and readers. Readers should have an easier time finding preprints they can trust and are interesting for them and authors should be convinced that posting preprints and their reviews is an efficient and visible way of sharing findings. In view of this feedback, we will integrate more preprint reviewing services into the EEB platform to further raise awareness about peer reviewed preprints across a broader range of disciplines.
- On the idea of finding ‘principled metrics’ related to novelty, depth and significance, one of the major issues raised during the discussion was to motivate the need for such metrics and be mindful of their potential misuse. To get a better sense and some data on whether ranking metrics might be useful to filter and prioritize content by users, we have already included and will add further ranking mechanisms based on the automated analysis of the knowledge graph that supports Early Evidence Base. These methods are not presented to users as ‘metrics’ (no scores are displayed) to avoid over interpretation and misuse of the rankings while allowing us to analyze their utility in filtering large amounts of preprints.
- Following positive feedback on the idea of identifying studies that potentially bridge fields, we have developed methods that automatically identify fields of research in an unsupervised way and exclusively based on the scientific content of preprints. These methods are successful in identifying emerging fields, such as research on COVD19/SARS-CoV-2, and open the door to find studies that belong to more than one field of research.
- The suggestion was made that different sections of the referee reports might be used to guide readers in selecting preprints and identify studies in specific fields or with a multi- or cross-disciplinary scope. We are therefore starting to integrate powerful automatic summarization methods to expose specific statements from referee reports, for example in order to highlight the expertise of the reviewers as a proxy for the depth of the reviewing and of the fields covered by a study.
Have you started any collaborations?
- We are collaborating with Peer Community In (PCI) to integrate PCI into the EEB platform. This will allow us to develop the necessary interface with CrossRef which has just started to support registration of peer review material linked to preprints.
Background information on current practices
With the increased popularity of transparent peer review, where reviews are made publicly available next to a preprint or journal article, the target audiences of formal referee reports do not only include the authors and journal editors, but also the readers. As such, the online presentation of referee reports may have to evolve such that it enriches the experience of expert and non-expert readers. In the context of peer reviews linked to preprints, this aspect is particularly important as in-depth reviews represent an invaluable resource that provide context and expert in-depth analyses. The time is therefore right to go the next step and use the reports on refereed preprints to highlight specific preprints and to guide readers through the otherwise non-navigable volumes of non-curated scientific information on preprints.
Important initiatives are currently underway in defining technical aspects of how to link reviews to preprints in a general way, how to standardize machine-portability of transparent reviews and how to leverage refereed preprints in journal-independent peer review or publish-review-curate workflows. For most users, the concept of refereed preprints remains however rather new and little is known about reader engagement with such preprints. How would readers search and browse preprints that have been reviewed by various entities? What section or what aspect of a highly technical, detailed formal review is the most important when selecting which preprint to read or to trust? In what form should this information be presented to users? How do expert vs non-expert use referee reports linked to preprints?
In addition to the information related to the provenance and other metadata related to peer reviews, it is of particular interest to delineate features that can be derived from the content of the reviews and that are worthwhile extracting, summarizing or visualizing to readers. Such features could include, but are not limited to, the expertise of the reviewers; summary of key points; highlights of different types of statements (e.g. critical, supportive, literature-supported, linked to requests for additional experiment or textual changes, related to data presentation, to novelty, presence of unsupported negative statements, tone of the review).
Making refereed preprint attractive and useful to a wide spectrum of readers will be a major driving force for the adoption of this way of rapid scientific communication by the community. Increasing the utility of refereed preprints will lead to higher visibility which is an important incentive for authors to engage with preprint peer review platforms.
Overview of the challenge to overcome
To start experimenting with the aggregation of refereed preprints produced by various peer review platforms handling author-driven preprint submissions, including Review Commons and eLife’s Preprint Reviews, and integrating the peer reviews and their summaries next to preprints, we have built the experimental platform Early Evidence Base (EEB, https://eeb.embo.org). EEB explores how human curation, through peer review, could be combined with machine curation, through text mining, to aggregate and filter refereed preprints. We intend to use this platform as a sandbox to experiment with various implementations and learn how to improve readers’ user experience when searching and interacting with refereed preprints.
Reviews are typically semi-structured with no universally applied format. The challenge is to identify conserved structural and semantic patterns that can be extracted as salient features that help readers finding, filtering and understanding preprints.
The ideal outcome or output of the project
Demonstration of the impact of exposing key features and attributes from referee reports on readers’ engagement with refereed preprints.
Description of the intervention
The scope of this project would a priori be restricted to preprint linked to formal in-depth reviews, typically organized within an author-driven submission process.
- Prioritization of features to extract from peer reviews based on user survey and user testing.
- Development of tools to capture or extract some of the features identified in #1.
- Implementation of feasible solutions developed in #2.
- Testing and evaluating implementations.
Plan for monitoring project outcome
- Identification of feasible feature extractions strategies.
- Benchmarking of feature extractions.Beta-testing, AB testing of various implementations and analysis on the impact on traffic, search and attention.
What’s needed for success
Additional technology development
- Additional preprint review services should be integrated to the EEB site to provide a broader diversity of refereed preprints and referee reports.
- Training and benchmarking sets should be assembled for machine learning by labelling relevant sections and statements in referee reports.
- AI tools should be developed to parse, summarize and classify features extracted from referee reports.
Feedback, beta testing, collaboration, endorsement
- Collaboration with review services to enable integration and access to the content of referee reports and survey various audiences.
- Public UI/Ux recommendations based on user surveys/testing.
- Beta-testing of display and rendering solutions.
- Open source release of machine learning models and extraction tools.
- User survey and testing: UI/Ux specialist
- Feature extraction: machine learning specialist
- Display and rendering: web developer.