• Engineering and Technology • Previous Articles     Next Articles

Text clustering method based on weighted LDA model

LI Guo, ZHANG Chunjie, ZHANG Zhiyuan   

  1. (a. College of Computer Science and Technology; b. Airport Engineering Research Base, CAUC, Tianjin 300300, China)
  • Received:2015-04-01 Revised:2015-04-23 Online:2016-04-23 Published:2016-06-03

Abstract:

In order to overcome the shortcomings and limitations of traditional clutering algorithms in dealing with largescale and high dimension text clustering, a text clustering method is presented based on weighted LDA(latent dirichlet allocation)model. Two distributions are obtained by LDA: the topic distribution and the word distribution of different topics hidden in the text, which are then combined as the text feature to obtain the final text similarity. Using the classic K-Means algorithm in both English and Chinese corpus, the experimental results show that compared with the pure VSMcombined with K-Means, this algorithm has better clustering effect.

Key words: LDA, vector space model, data mining, text clustering, K-means

CLC Number: