中文报道均转载自果壳网,版权属于果壳网(guokr.com)
http://www.guokr.com/article/439343/
时隔一年,中国论文贩卖市场依然活跃
IvyP/译
两名计算机生物学家在利用PubMed搜索引擎了解最新研究动态时,偶然发现中国的论文贩卖公司依然在运作,而在1年前,《科学》(Science)期刊就已经发表过一篇文章,深入分析了这一结构复杂又利润丰厚的产业。
巴塞罗那基因组调控中心的吉拉姆·菲利翁(Guillaume Filion)和法布拉大学的卢卡斯·凯瑞(Lucas Carey)从PubMed下载了2012年1月到今年4月期间的论文出版记录。通过利用一种称为自然语言处理(natural language processing,NLP)的技术,二人将这200万篇论文的摘要部分进行比对分析,找出了2014年出现频率最高的词汇。
菲利翁表示,他们本来是希望通过这种方法找到会成为热点的新研究方向。不出所料,他们发现与前沿话题有关的研究论文的数量有所增加,例如CRISPR(clustered regularly interspaced short palindromic repeats,规律成簇的间隔短回文重复),一种被《科学》(Science)期刊提名为2013年最重要科学突破第二名的基因编辑技术;以及目前基因组学研究中的热门——长链非编码RNA(long non-coding RNA,lncRNA)。
但除了这些意料之中的研究趋势,还有一个很突出的词汇——由位于伦敦的补充医学研究委员会(Research Council for Complementary Medicine)运作的一个鲜为人知的数据库CISCOM。菲利翁和凯瑞指出,2014年之前,“CISCOM”这个字眼每年仅会出现在2-3篇论文中,而从2014年2月起,这个数据库突然间开始每个星期就出现一次。
二人通过进一步分析发现,一组32篇有关不同主题的论文却都有着相同的特征——都是利用数据库里已发表得文章写出的综述文章,这些数据库不但包括CISCOM,还包括一些较常用的数据库,如谷歌学术搜索、PubMed和Web of Science等。此外,这些文章全部来自分布于中国若干城市的28个不同的研究小组。
菲利翁在10月4日发表的一篇博文中描述了这些论文如何惊人的相似,后来他决定与凯瑞一起探查背后的隐情。他们通过各种方式下载了这25篇论文的全文。而将这些论文用剽窃检测程序iThenticate进行分析后,他们并没有发现抄袭的情况。(
说明论文代理还是比较敬业的嘛)
不过,所有这些文章的讨论部分都有类似的文字陈述,只有句式上有略微变化。(模板用多了?
)比如一篇文章写道:“重要的是,有关研究对象和对照组的选择标准在这些所有的研究中都没有明确描述,因此可能会对我们的结果造成影响(Importantly, the inclusion criteria of cases and controls were not well defined in all included studies and thus might have influenced our results)”;而另一篇则写道:“重要的是,有关研究对象和对照组的选择标准在这些所有的研究中都没有明确描述,而这可能也会对我们的结果造成影响(Importantly, the inclusion criteria of cases and controls were not well defined in all included studies, which might also have influenced our results)”。
另外,4篇文章中出现了相同的语法错误——在“我们的结果过去没有足够的数据支持(our results had lacked sufficient statistical power)”中无缘无故出现“过去(had)”一词。在试图找出这些文章之间的联系时二人注意到,文章作者们使用的文字描述似乎来自多个模板中,这意味着这些作者在有意调整这些文字的顺序,而这种方法正是用来逃避剽窃检测软件的,这种手法被称为“洗文”(text laundering)。
这些论文中的大多数提交于2013年末,而同一时期发表的论文之间不可能相互抄袭。因此菲利翁和凯瑞推测,这些论文可能都出自同一个公司之手。在来自上海复旦大学的遗传学家余垚的协助下,二人找到了一家公司,其网站广告称可以代写荟萃分析论文。于是他们联系了该公司并询问服务细节,公司表示可在影响因子2或3的期刊上发表荟萃分析论文,价格为每篇1万美元。

以“代发论文”为关键词在百度搜索,可以找到大量承接这种业务的公司,这些公司一般都有一整套完善的代发流程。图片来源:网站截图
2013年发表在《科学》期刊上的一篇文章就是有关一次长达5个月的卧底调查,调查中发现了许许多多类似这样的公司。这些公司提供一系列的服务,旨在帮人发表能出现在重要索引中的论文,包括汤森路透(Thomson Reuters)的科学引用索引和社会科学引用索引,以及爱思唯尔(Elsevier)的工程索引,而这些指标在中国许多研究机构内都是提职称的重要参考。除了利用客户提供的数据创作论文以外,这些公司还可以伪造实验数据、在已被期刊接收的论文中加作者,并出售写好的原稿。
在论文成品中最受欢迎的就是荟萃分析论文,可能是因为写这类论文不需要原始数据。2013年6月发表在《公共科学图书馆·综合》(PLOS ONE)期刊上的一篇研究论文发现,2003到2011年间,来自中国的荟萃分析论文的增长速度比美国快16倍。如果在PubMed中搜索其它研究趋势,可能会找出更多有关不正当科研行为的证据,但是菲利翁表示,他和凯瑞目前打算将注意力转向其他地方,因为“我们不是打假人,而是大数据分析专家”
。(编辑:球藻怪)
原文:
http://news.sciencemag.org/asiapacific/2014/10/copycat-papers-flag-continuing-headache-china
Copycat papers flag continuing headache in China By Mara Hvistendahl
14 October 2014 1:00 pm
3 Comments
SHANGHAI, CHINA—Two computational biologists searching for trends in journals indexed in the search engine PubMed stumbled across signs that China’s paper-selling companies remain active, 1 year after
Science published a detailed undercover investigation describing a highly sophisticated and lucrative industry.
Guillaume Filion of the Centre for Genomic Regulation and Lucas Carey from Pompeu Fabra University, both in Barcelona, downloaded all PubMed records for papers published between January 2012 and this past April. Combing over the abstracts for those 2 million papers using a big data technique called natural language processing, they isolated terms that spiked in use in 2014.
They hoped to find “new topics about to detonate,” Filion says. Not surprisingly, they found an uptick in papers mentioning cutting-edge topics like CRISPR, a gene-editing technique that was named a
runner-up for Science’s 2013 Breakthrough of the Year, and
lncRNA, or long non-coding RNA, an unusually long form of RNA that is now a hot topic in genomics.
But alongside those more predictable trends, one term stuck out: a little-known database run by the
Research Council for Complementary Medicine in London called CISCOM, or the Centralised Information Service for Complementary Medicine. Until 2013, the scholars note, the term “CISCOM” appeared in only two to three papers per year. In February, the database began cropping up once a week.
Looking more closely, Filion and Carey found a group of 32 papers on varying topics that nonetheless shared some curious characteristics. All were meta-analysis or review papers that analyzed already-published data in CISCOM, along with more commonly used databases like Google Scholar, PubMed, and Web of Science. Moreover, all originated in China, from 28 different research groups spread out across several cities.
Filion, who described what he calls the “disturbingly similar” papers in a
blog post published on 4 October, set out with Carey to determine what was going on. They downloaded complete versions of the 25 papers to which they had access through various institutional subscriptions or other means. (All but two papers are behind a pay wall.) Running the papers through the
plagiarism detection program iThenticate turned up no red flags.
But the discussion sections of all the papers contain similar statements, with only minor changes. For example, one paper reads, “Importantly, the inclusion criteria of cases and controls were not well defined in all included studies and thus might have influenced our results.” Another states, “Importantly, the inclusion criteria of cases and controls were not well defined in all included studies, which might also have influenced our results.”
Four of the papers include the same grammatical error—the extraneous “had” in “our results
had lacked sufficient statistical power.” But in mapping out the relationships among the papers, the duo noticed that the writers seemed to be drawing from multiple templates. That suggests, Filion says, “that the writers actively shuffle the texts”—
a method of evading plagiarism detection software known as text laundering.
Most of the papers were submitted in late 2013, making it impossible that some authors plagiarized others after publication. Filion and Carey thus hypothesized that the papers might all be the work of a single company. With help from Yao Yu, a geneticist at Fudan University in Shanghai, the scholars identified an outfit whose website advertises tailored meta-analysis papers and contacted the company to inquire about its services. The company reportedly offers meta-analysis papers for journals with an impact factor of 2 or 3 for about $10,000.
A
5-month investigation published in Science last year found dozens of similar companies offering an array of services aimed at securing publication in journals indexed in Thomson Reuters’ Science Citation Index, Thomson Reuters’ Social Sciences Citation Index, or Elsevier’s Engineering Index—which at many Chinese institutions are critical to securing promotions. In addition to preparing original papers from scratch with data provided by their clients, China’s paper-selling companies fabricate data, arrange to add scientists’ names to already accepted papers, and sell finished manuscripts.
Among the most popular options for finished manuscripts are meta-analyses, perhaps because they require no original data.
One legitimate analysis published in PLOS ONE in June 2013 found that from 2003 to 2011, meta-analysis papers from China rose more than 16 times faster than did such papers from the United States. Combing PubMed for other trends might turn up more evidence of malfeasance. But Filion says he and Carey now plan to turn their attention to other topics: “We are not witch-hunters, we are big data analysts.”