In scientific publishing, the quality of peer review can make or break a journal's reputation. At Prophy, we've spent years tackling one of the most challenging aspects of academic publishing: finding the right reviewers—experts who understand the specific research topic and have no conflicts of interest.
Publishers constantly struggle with finding reviewers who have expertise in increasingly specialized research areas, while securing acceptance from qualified experts who are already overwhelmed with review requests. They need to identify and avoid potential conflicts of interest that could compromise review quality, all while ensuring a range of perspectives that maintain subject-matter expertise.
As publication volumes grow yearly, these challenges only intensify. Reputable publishers need to be selective with their review invitations to maintain their reputation. As our CTO Vsevolod Solovyov explains: "Publishers who care about their reputation try not to spam people and only send relevant letters. If someone refuses and says 'never write to me again,' they'll never write to them again."
Traditional reviewer selection often relies on editors' personal networks and memory—a system that becomes increasingly untenable as research specializes and publication volumes increase. When editors rely solely on memory and known contacts, they create a closed loop where the same familiar experts face reviewer fatigue, emerging researchers get overlooked, and excessive time gets spent searching for appropriate reviewers. Perhaps most critically, matching specialized manuscripts with precisely the right expertise becomes nearly impossible when working from memory alone.
At Prophy, we've developed a sophisticated approach to reviewer selection. Using semantic similarity and bibliometric analysis, we match manuscripts with potential reviewers based on the content of their published work, not just their keywords or fields.
Our Referee Finder system analyzes content semantically, understanding the meaning and context of research papers across our database of 176 million articles. We evaluate recency, volume, and positioning of authors' contributions while automatically identifying collaboration networks and institutional connections that might create conflicts. This comprehensive approach ensures relevant expertise while providing options from different research groups.
The foundation of our effective reviewer matching lies in how we measure similarity between a manuscript and potential reviewers' previous work. Let's pull back the curtain on how our similarity scoring actually works.
When we evaluate the relevance of potential reviewers, we don't simply count keywords or citations. Instead, we create multidimensional representations of research content that capture semantic relationships. Our similarity score ranges from 0 to 1, representing how closely aligned a reviewer's previous work is with the manuscript in question. A score approaching 1 means extremely high similarity, while 0 indicates no meaningful connection.
As our CTO Vsevolod Solovyov explains: "If one paper is very similar to another paper, the score will be close to 1. Similarity is measured across many dimensions, so it's not easy to get a high score."
This is where things get interesting. When we look at an author's entire body of work, how do we determine their overall relevance to a specific manuscript? Consider this scenario: Researcher A published one paper with a 0.98 similarity score, while Researcher B published 30 papers with an average similarity score of 0.85. Who's the better reviewer?
We've explored several approaches to this problem and discovered that the obvious solutions don't work well. Simple averaging fails badly because it obscures specialized expertise. Imagine a researcher who's published 100 papers on exactly the same topic as the manuscript (scores of 0.98) and 100 papers on a completely unrelated topic (scores near 0). Their average would be around 0.49, which significantly underrepresents their expertise. Meanwhile, someone with just 5 moderately relevant papers (all scoring 0.8) would have an average of 0.8 and appear more relevant, which isn't accurate.
Taking only the maximum score isn't sufficient either. A single relevant paper might indicate interest but not necessarily deep expertise. A master's student who co-authored one paper under a professor's guidance doesn't have the same expertise as a postdoc with 30 related publications.
And simply summing scores can be misleading too. Someone with 200 somewhat relevant papers would outscore someone with 10 highly relevant ones, even if the latter is the true specialist we need.
After testing numerous approaches, we've developed a system that heavily weights the most similar papers while dramatically reducing the contribution of less similar works. Papers with high similarity scores contribute significantly to an author's overall score, while papers with lower similarity have an exponentially diminished impact. For many use cases, we consider only an author's top 3-10 most relevant publications.
This approach identifies both subject-matter specialists and researchers with broader relevant expertise, without being skewed by publication volume or unrelated work.
In many research fields, author order carries significance. In some disciplines, first authors conducted most of the research; in others, last authors are the senior researchers who guided the work; in fields like physics, alphabetical ordering is common.
We've explored incorporating author position into our scoring system, but the results were surprising. As Solovyov notes: "We tried to adjust scores based on author position—giving more weight to first authors, less to tenth position authors. But the results actually got worse."
The challenge is that author ordering conventions vary widely across disciplines and even within research groups. In some papers, special symbols indicate equal contribution among certain authors. In others, technical contributors might be included without having deep domain expertise.
Solovyov explains: "If a biology paper uses specialized software, the researchers might include the software developers as co-authors. These developers understand the software perfectly but may know very little about the biological research itself."
This is why we primarily focus on content similarity rather than authorship position in our core scoring algorithm, though we provide filtering options for those who wish to apply these considerations manually.
At Prophy, our approach to diversity in reviewer selection goes beyond what people typically think of as diversity measures. We focus on ensuring a range of perspectives and avoiding reviewer fatigue.
One thing we've implemented is what we call 'diversify' in our system. It's not about quotas—it's about preventing situations where all five researchers from the same lab receive identical review requests on the same morning.
This diversity approach serves to avoid reviewer burnout by preventing the same experts from receiving redundant review requests. It helps maintain positive publisher reputation, as publishers who repeatedly send irrelevant invitations or mass-invite entire research groups damage their standing. And it distributes opportunities by ensuring multiple qualified reviewers receive consideration.
Our system achieves this through intelligent filtering that recognizes when multiple authors have collaborated on the same papers. It ensures only one member of a collaborative group is recommended for a particular manuscript, selecting the member whose research most closely matches the manuscript's specific focus. We also allow filtering by geography, career stage, and institutional affiliation when desired.
"When multiple researchers have co-authored papers together," Solovyov explains, "we select based on who's the better match from a knowledge perspective—this is real meritocracy. We don't look at gender, nationality, or other factors. We simply look at who knows best what we need."
Our journey developing these systems has taught us some practical lessons about reviewer selection that go beyond theoretical models. For many publishers, certain baseline criteria save time and improve results. "Publishers often want reviewers with a minimum number of publications," Solovyov notes. "We allow setting thresholds—12 papers, 50 papers, whatever makes sense—to filter out researchers with limited experience."
This practical approach addresses a fundamental reality: while a student with one perfect-match paper might theoretically score well, they likely lack the broader perspective needed for effective peer review.
Different use cases need different approaches too. What works for journals isn't always right for grant agencies. When working with funding organizations that have pre-vetted expert pools, we take a different approach: "With grant agencies, we're often dealing with already-qualified experts," Solovyov explains. "They've already been selected for sufficient experience. So we focus more on finding who best matches this specific proposal, often using just the top single match rather than broader experience metrics."
One area we're continuously improving is handling truly interdisciplinary work. When a manuscript spans multiple fields, the ideal reviewer might need expertise in specific combinations of disciplines. "Sometimes you have biology researchers using specialized software," Solovyov points out. "The software developers might be co-authors but understand little about the biology. Identifying truly cross-disciplinary experts requires more sophisticated analysis."
We're constantly refining our approach to reviewer selection. Some promising directions include more granular contribution analysis to better understand who contributed what expertise to multi-author papers, clearer explanations of scoring to make similarity scores more intuitively understandable, and enhanced interdisciplinary matching to identify reviewers with specific combinations of expertise.
The scientific peer review process is evolving, and so are we. By grounding our development in real-world publishing challenges and rigorous testing, we're working toward a future where finding the perfect reviewer is both more precise and less burdensome.
Looking to improve your journal's reviewer selection process? Contact us to learn more about how our Referee Finder can transform your workflow.