Comparative peer review (CoPR)

To inform the valuation of research papers, we propose a “comparative peer review” (CoPR) system that resembles the Elo chess rating system. 

How the system works

In the proposed CoPR system, each research paper would have a numerical rating. Ratings would be updated through peer review. A peer reviewer would be presented with a pair of research papers and would select which of the two papers has greater scientific and socioeconomic value. The comparison would update the papers’ ratings. A large number of comparisons would produce a numerical rating for each paper that represents the scientific community’s current assessment of its value. 

Selecting pairs of papers for comparison

When selecting pairs of papers to present to reviewers, the CoPR system will take into account the papers’ topics and the reviewers’ areas of expertise. Papers can be given locations within the space of research topics based on algorithmic analysis of their content. Similarly, reviewers’ locations can be based on the locations of their papers. Using this notion of location, it is possible to assign a distance between topics that describes their degree of similarity.

The CoPR system would then use this notion of distance when choosing papers and reviewers. Comparisons would be made between papers of similar topics because it is difficult to assess the relative value of two papers in dissimilar fields. Furthermore, the expertise of the reviewer would determine which papers she would be best able to evaluate. Additionally, researchers may generally favor their own topic. To reduce opportunity for such favoritism, peer reviewers would be presented with pairs of papers such that each of the two papers is the same distance from the reviewer’s area of expertise.

Comparisons should also be made between papers with similar ratings to reduce reviewer inclination to simply go along with the existing ratings. 

Predictive score

Each reviewer would have a predictive score which represents the reviewer’s skill in predicting papers’ future ratings. This is analogous to saying that the reviewer’s past reviews were bets and the reviewer’s predictive score is the total outcome of those bets. For example, assume that a reviewer compares papers A and B and decides that A is of greater value. At some later time, if A has a higher rating than B, then this prediction increases the reviewer’s predictive score whereas if A has a lower rating than B then that prediction would reduce the reviewer’s predictive score. 

A reviewer’s predictive score would be based on their reviews that are more than 1 year old. The time delay encourages reviewers to select papers that other researchers will consider to be valuable in the future, rather than simply estimating current opinion. 

Universities and other research institutions may prefer to hire and promote individuals whose predictive score demonstrates an ability to accurately predict valuable research, and for that reason reviewers would be motivated to maximize their predictive score by making careful reviews. 

Measuring variation in community opinion

In chess, the outcome of a match is largely determined by the players’ abilities, resulting in fairly stable player ratings. Player performance is assumed to be approximately described by a logistic distribution. By contrast, the outcome of a comparison between research papers depends on peer opinion, which will have some degree of noise and which may also be significantly different from a logistic distribution. Suppose we could, for example, assign a distribution of ratings to a paper that takes into account the opinion of every researcher. For some papers, this may be a logistic distribution and the CoPR rating system would converge to approximately the mean value after enough comparisons. For other papers, the distribution of reviewer opinions may be bimodal, with researchers in two different fields having differing perspectives on the value of the paper. In the bimodal case, the CoPR rating would likely stay between the two modes of the distribution, but convergence would be slow. While bimodal distributions may present challenges for valuing papers, they are an example of useful information about the perception of the research community regarding certain papers and certain topics. The CoPR system would have the opportunity to continually explore and discover such phenomena. That exploration may uncover trends and useful information about the research community that would otherwise be difficult to detect.

Motivating scientific progress

CoPR samples the community to assess the current perception of research value. It has some advantages over other valuation metrics, but it does not produce significant incentive for researchers to change research direction. To motivate flexible adoption of new ideas, we need a system that incorporates not just value but also a form of incentive. This is a feature of the research equity system in which work can be sold based on its value. In the research equity system, researchers have two methods of voting. One method of voting is by their reviews of papers in the CoPR system. The other method of voting is to “vote with their feet” when choosing their research direction. At the same time that peer reviewers may be favoring research similar to their own, they will presumably begin doing new research in topics that they expect to be valuable in the future. Although there may be some lag, promising new areas for research would become active areas for research, which later would become highly rated as people who now work on those topics increase the value of papers in those topics by favoring them in CoPR comparisons.

Calculation of ratings

In the proposed CoPR system, each research paper would have a numerical rating. Ratings would be updated through peer review. A peer reviewer would be presented with a pair of research papers and would select which of the two papers has greater scientific and socioeconomic value. The comparison would update the papers’ ratings. As in the Elo rating system, the degree to which a paper is favored to win a comparison influences the magnitude by which its score changes. If paper A has a rating of RA and paper B a rating of RB, the probability of A winning the comparison is

EA=1/(1+10(RB-RA)/400).

For every additional 400 rating points that one paper has relative to another, the higher rated paper is 10 times more likely to “win” a comparison between the two papers. 

The outcome of the comparison is represented by SA where

SA=1 if paper A wins

SA=0 if paper A loses

The formula for updating RA is

RA′=RA+K(SA-EA)

The factor K determines the weight of the outcome on the ratings. In chess, for example, K typically has a value between 10 and 40. In the CoPR system, the K value can be determined based on other aspects of the review.

After a large number of comparisons, a paper’s rating would converge or the algorithm would reach diminishing returns in seeking convergence.