Saturday, April 14, 2018

How to Write Highly Cited Articles

This blog posting is for all my friends and contacts who write (or wish to write) scientific or technical papers that garner citations.

I recently attended a talk (at UPitt Center for Philosophy of Science) given by Simon DeDeo of Santa Fe Institute and Carnegie Mellon Philosophy Dept., in which he described his apparently unpublished work on mini scientific revolutions, as detected by analyzing word clouds in scientific papers on ArXiv.org, a moderated repository of electronic preprints in quantitative disciplines.

His hour-long talk covered many issues he detected by crawling the entirety of ArXiv doing textual analysis. As I've learned from other talks, a bag of words (or word cloud) can fairly accurately characterize what an article is about. So if you follow articles in a given discipline, looking for points where the word distribution shifts, you may have detected a mini scientific revolution, where many in the field decided to start using some new words and phrases, which then go out of fashion after a while when yet-other word clouds come along and displace them.

A popular and plausible metric for when this has occurred is K-L Novelty [citation needed], which detects these word cloud shifts using a simple Shannon-based formula (for how "surprising" is some set of words, given what came before).

Areas investigated by DeDeo, an astrophysicist by training, included String Theory, whose K-L Novelty he says has been flatline since the 1980's, and whose 3-year lookback has converged to Brownian motion. But leaving aside this tragedy, he also offered up some useful advice.

If you want to write a scientific or technical paper that gets a lot of citations, do the following. Write your articles using a 3-year lagging word cloud, {which will cause readers to think your stuff looks current and well founded, so they will start pondering it.} Then wait patiently, and their papers that cite you will appear around 18 months later.

Strangely, DeDeo didn't manage to phrase his conclusion quite as succinctly as the above (I had to add the words in { } curly braces), but I believe this is a solid inference from his data, in which he obviously looked at tons of well cited articles and saw this pattern emerge.

Of course this raises a few other issues, such as a) you've got to knuckle down and absorb the current literature, and b) using only a 3-year lookback might make it difficult to write breakthrough work. However, if you're a graduate student or post-doc hungry for citations, and disinclined to foment a scientific revolution, this seems like a straightforward recipe for success in the citation game.