A novel procedure on next generation sequencing data analysis using text mining algorithm
作者: Weizhong ZhaoJames J. ChenRoger PerkinsYuping WangZhichao LiuHuixiao HongWeida TongWen Zou
作者单位: 1Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration
2Xiangtan University, Xiangtan
刊名: BMC Bioinformatics, 2016, Vol.17 (1)
来源数据库: Springer Nature Journal
DOI: 10.1186/s12859-016-1075-9
关键词: Data miningTopic modelingNext-generation sequencing (NGS)Genetic diversityBiomarker
英文摘要: Abstract(#br) Background(#br)Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining.(#br) Methods(#br)We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella...
全文获取路径: PDF下载  Springer Nature  (合作)
影响因子:3.024 (2012)

  • mining 矿业
  • procedure 手续
  • generation 世代
  • algorithm 算法
  • sequencing 排序
  • novel 长篇小说
  • analysis 分析