Uncited articles: the dark matter of scientific literatureΒΆ

The story goes like this: when you are doing your research and you get awesome results, you write a paper and you publish your results in a scientific journal. With time, other researchers cite your work in their papers as they use your awesome results for further advancement of science. Like grains of sand forming a sandcastle, small and big contributions have been piling up basically since the foundations of the first scientific societies, linked by a vast network of citations. However, sometimes things don’t quite work that way. Instead, you favorite published paper lingers in the literature without garnering a single citation, until it gets buried in the scientific literature, never to be cited.

Now, to be fair, lack of citations does not mean lack of impact. At least in my own limited experience, I have not seen a perfect correlation between the times a paper is cited and peer recognition: some papers with tons of citations don’t seem to register, and people have approached me to talk about results that have been cited just a handful of times, if any. Still, citations are used in many metrics to quantify scientific excellence (impact_factor, h-index ) and they are part of our reality as researchers.

There are four things that I find interesting about uncited papers:

  • They have passed a peer review process and the editor filter, which means that they should fit within the scope of a journal and be as free of errors as other papers, and yet no-one got to cite them. Is there anything that makes these papers special?
  • The fact that they don’t have a single citation means that not even the authors have cited them, something that I find remarkable given the prevalence of self-citations.
  • Uncited articles are the equivalent of dark matter in scientific research, they count to the total tally of published papers, yet they are disconnected from the rest of the scientific literature, at least citationwise. The difference with dark matter is that we can actually count how prevalent they are.
  • Uncited articles do not seem to fit the current picture of citations in the scientific literature.

In 2008 a nice paper was published in PNAS on the universality of citation distribution across different disciplines: different research fields cite at a different rate. Some areas are “heavy citers” and a paper can contain more than 50 references. Consequently, the average paper in that discipline has been cited many times. In other disciplines, the average number of citations can be much lower. What they found is that if you normalize the data, different disciplines fall into a sort of universal curve that follows the log-normal distribution.

../_images/PRB2006.png

So I got curious, and I decided to check this by myself. I took all the articles published in Physical Review B in 2006 and plotted the number of articles as a function of the number of times that they have been cited. That’s the data shown in the figure above. Along with the data, I have included in the figure the predictions of two models: one is the lognormal distribution mentioned in the PNAS article, and the second is the Poisson distribution.

The Poisson distribution is the distribution that would be expected if all the articles were equally likely to be cited. It is clear that in reality things work differently, as evidenced by the total disagreement with the data.

The lognormal distribution, however, is able to reproduce the trend very well for the long tail. The simplest model that is consistent with a lognormal distribution is one in which the number of citations grows at a rate that is proportional to the number of citations already accumulated (this implies that the percent growth rate, relative increase in citations with respect to the total number of citations is independent on the number of citations garnered by the article). In other words, citations call more citations. The accumulation rate itself depends on how interesting is the paper, and the lognormal distribution is obtained if one assumes that the “interest” in a given article is normally distributed around an “average interest”. This is a simplified version of Gibrat’s law. I am not saying that this is what happens in real life, simply that the data taken by itself is consistent with this simple model. But more interestingly, the lognormal distribution fails to predict the behavior for low number of cites. It predicts that the number of articles cited zero times is zero, when from the plot it is clear that it is not the case.

As of early July 2013, 3% of the articles published in Physical Review B in 2006 have not been cited any single time, at least according to Web of Science. That does not seem much, but Physical Review B is a journal with a reasonable impact factor. So how does this number change across different journals? In order to do so, I decided to set on a discipline, in this case Applied Physics as defined by Web of Science.

../_images/uncited_if.png

In the figure above, I have plotted the percent of uncited articles in the Applied Physics Category, as a function of the impact factor of the scientific journal in which it was published (I took articles published in 2006 again for that plot). First of all, the fraction of uncited articles shoots up to more than 20% for low impact factor journals. Visually the correlation between the two is also apparent. Curiously enough, a Pearson coefficient of approximately -0.75 is observed when one correlates the percent of uncited articles with the log of the impact factor, rather than the impact factor itself. This is consistent with an exponential probability model with the impact factor, which probably is acting here as a proxy for the average “interest” of an article published in a given journal.

So yes, there are a bunch of uncited articles floating around the scientific literature, waiting to be cited. And if one thinks that this is a consequence of the explosion in the number of articles in the scientific literature in recent years, think twice: if you go to the first issue of Physical Reviews from 1894, you three of the five articles published in that first issue failed to be cited by any paper in any APS journal. Their titles? On the relation betweeen the lengths of the yard and the meter, The critical current density for copper deposition and the absolute velocity of migration of copper ions, and Geometrical proof of the three-ammeter method of measuring power.