Although it would be a bit dangerous to extend those conclusions to the vast realm of published scientific literature, the book How Experts Fail includes a chapter questioning whether most published research findings are false
. It quotes a study made by a medical researcher
(which we should assume he's read
) that studied several of the most referenced papers in medicine, and found that many had been either superseded, not proved independently , or their published results mentioned effects much stronger than seen afterwards.
Besides, it points to the lack of basic statistical or general mathematical skills among scientists. Which is basically true, at least in my generation; nowadays it's much more difficult to get a paper published without proper statistical analysis.
There is a basic problem: there's not much to gain in replicating other's results. If you publish a paper saying "That guy who published that algorithm is right", it will be rejected on the basis that there's no original content; besides, people don't usually publish datasets and code used for papers; usually, you can grab it by requesting it, but nobody's sure it's the exact same version.
Since we live in this connected world, and virtually all papers are sooner or later online, my opinion is that it should be compulsory to upload code and data you used for published results, so that anybody can check it. Or build on it. My group
, and my own code, is uploaded to a publicly available repository while it's being worked on (for instance, code for computing an author's h- and g-index
). I guess it will happen sooner or later, anyways, as soon as the publishing house's infrastructure is prepared for it.