To preprint or not to preprint? Research for a more transparent publishing system

This blogpost is cross-posted from the ScholCommLab blog (Alice Feerackers, July 29 2019) and provides an update on a current research project by two visiting scholars supported by ASAPbio.

A man in an academic library — *“Academic life” by uonottingham is licensed under CC BY-NC-SA 2.0*

“For researchers, there is immense pressure to publish in journals that are highly competitive,” says Naomi Penfold, associate director of the scientist-driven nonprofit ASAPbio. “[This, in turn,] means that the process of sharing what you have found, evaluating whether claims are valid or not, and gaining recognition and visibility is all wrapped up in the long, arduous, and mostly opaque process of publishing at these few journals.”

Improving this “long, arduous” process is core to ASAPbio’s mission of advancing “innovation and transparency in life sciences communication.” It’s also the focus of a new research collaboration between ASAPbio and the ScholCommLab exploring the status of preprint adoption and impact in different research communities.

In this post, we’re shining a spotlight on the Preprints Uptake and Use research team, and offering a glimpse of their findings so far.

Why study preprints?

Academic peer review is often seen as a cornerstone of science, and remains one of the most trusted ways of assessing research quality. But despite its status within academia, there’s little evidence of its effectiveness. (In fact, some research suggests peer review may actually prevent high quality science from becoming published).

Whether or not this is the case is still unclear, but scholars are exploring new avenues for disseminating their work. Preprinting — openly publishing research findings before submitting them for peer review — is one such avenue. By allowing researchers to circumvent the lengthy peer review process, preprints have the potential to catalyze research collaboration and innovation — months before the final journal article is published.

“I see preprints as an online-first tool that allows anyone to discuss the latest findings while they’re still fresh”—Naomi Penfold

“I see preprints as an online-first tool that allows anyone to discuss the latest findings while they’re still fresh,” says Naomi. “Preprints could help increase the likelihood and speed that science is seen, understood, tested, and built upon, which would be beneficial for individual researchers and society at large.”

But despite these potential benefits and substantial growth in preprinting in the past few years, little is known about the use of preprints in individual academic communities. While we can track preprint numbers in major subject categories and on individual servers, we currently do not have data to understand who is preprinting and whether there are nuances between individual research communities: Which researchers preprint more than others in their network? In which research fields is preprinting growing in popularity, and in which fields is adoption disproportionately low? These are just some of the questions that the Preprints Uptake and Use research team is exploring this summer—questions that could help inform efforts to raise awareness of preprinting and measure their impact on science.

Meet the Preprints Team: Mario Malički and Janina Sarol

preprints visiting scholars Mario Malicki and Janina Sarol — Preprints visiting scholars Mario Malički and Janina Sarol

The ScholCommLab preprints team is comprised of two visiting scholars: Mario Malički and Janina Sarol. A recent postdoc at AMC and ASUS Amsterdam, Mario has been researching the role of journals in fostering responsible research conduct since 2017. Janina is a PhD student in Informatics at the University of Illinois at Urbana-Champaign with a background in Computer Science and Information Management. Together, they bring a unique mix of skills and interests to the research team.

“I used to work at the university library back in Illinois,” Janina says, when asked how she first became interested in preprints. “I was transforming all of their collections into linked data, and I was surprised by how much dirty data there was.” There were so many authors and contributors, so many different articles and journals. Determining who had published what and where, she explains, turned out to be no small feat.

While others may have been frustrated by the messiness of bibliometric data, Janina was fascinated by it. “That’s what interested me,” she says, “trying to clean the data, so that we could do a better analysis.” She smiles, “Bibliometrics is like a library. All of the books seem to be stacked in the right place. But, when you dig deeper, things are not always so neat. “

“Bibliometrics is like a library. All of the books seem to be stacked in the right place. But, when you dig deeper, things are not always so neat. “ — Janina Sarol

Like Janina, Mario’s interest in scholarly publishing began when he discovered how flawed the system could be. It was 2011, and he had taken a step back from medical school to join the research department at the University of Split. He was working under the supervision of the editors of Croatian Medical Journal, the country’s top medical publication.

“They experienced a lot of people trying to bribe them to get into the journal,” he explains. “In most Croatian universities at the time, you needed at least one publication in a journal with an impact factor higher than 1 to get your PhD,” he continues. “In a country as small as ours, there was only one such journal—and that was theirs.”

Witnessing these informal pressures firsthand sparked a deep interest in how the process of authorship works—and how it could be improved. He dove into the world of meta-research, balancing his teaching responsibilities with this new passion. “I completely fell in love with meta-research,” he says, “I realized that I would never go back to the hospital.”

“I completely fell in love with meta-research, and realized that I would never go back to the hospital.” — Mario Malički

A messy today and a bright tomorrow

For the last month, Mario and Janina have been working under the supervision of ScholCommLab co-director Juan Pablo Alperin to collect, consolidate, and analyze data from more than 60 different preprint servers. Eventually, they hope to use the data to answer such questions as:

How quickly do individual preprint servers develop?
Who publishes preprints and how often?
Do disciplinary or geographical factors influence preprint uptake or use?
Are the preprints in any way different from the published papers?
How many preprints end up being published?

But first, the team has to clean the data—a task, it turns out, that’s more complex than expected.

“The most striking thing about the project so far is that we had to go through each aspect of the metadata we were trying to collect—author, date, subject—and ask, ‘Can we trust this or not?’” Janina explains. “For most things, the answer was, no.”

The data the team has analyzed so far is so riddled with missing metadata, duplicate entries, and conflicting information that excluding problematic entries simply isn’t possible. “We’re not talking about 10, 20, or 30 records with errors,” Mario explains. “We’re talking about 1,000 records. If you exclude those, it’s ridiculous.”

But although the messy data poses challenges for Mario and Janina’s study, it also raises important questions for the future of preprints research.

“I think people presume that, because the data is out there, it’s correct… We’re hoping that, after this, people will be a bit more aware that you just can’t trust those numbers.” — Mario Malički

“I think people presume that, because the data is out there, it’s correct,” says Mario. “But preprint servers don’t always have checks in place in the same way that journals do. We’re hoping that, after this, people will be a bit more aware that you just can’t trust those numbers.”

Despite the setbacks, the team is optimistic about the future. They’re still plugging away at the data, and are in touch with some of the preprint servers with suggestions for how to improve their metadata systems.

“We’re excited to start analyzing the data,” says Janina. “If all goes well, we’ll be publishing a preprint of our own soon.”

To stay up to date about the Preprints Uptake and Use project, visit the ScholCommLab’s website or sign up for the lab’s newsletter.