Text clustering method based on weighted LDA model

• Engineering and Technology • Previous Articles Next Articles

Text clustering method based on weighted LDA model

LI Guo, ZHANG Chunjie, ZHANG Zhiyuan

(a. College of Computer Science and Technology; b. Airport Engineering Research Base, CAUC, Tianjin 300300, China)

Received:2015-04-01 Revised:2015-04-23 Online:2016-04-23 Published:2016-06-03

Abstract

Abstract:

In order to overcome the shortcomings and limitations of traditional clutering algorithms in dealing with largescale and high dimension text clustering, a text clustering method is presented based on weighted LDA(latent dirichlet allocation)model. Two distributions are obtained by LDA: the topic distribution and the word distribution of different topics hidden in the text, which are then combined as the text feature to obtain the final text similarity. Using the classic K-Means algorithm in both English and Chinese corpus, the experimental results show that compared with the pure VSMcombined with K-Means, this algorithm has better clustering effect.

Key words: LDA, vector space model, data mining, text clustering, K-means

CLC Number:

TP18

LI Guo, ZHANG Chunjie, ZHANG Zhiyuan. Text clustering method based on weighted LDA model[J]. .

[1]	WANG Wenyi , ZHANG Hanshuo. ADS-B signals separation algorithm based on cluster weighted covariance matrix [J]. Journal of Civil Aviation University of China, 2025, 43(1): 47-52.
[2]	MA Lan, GAO Yongsheng. Four-dimensional trajectory prediction method based on ADS-B data mining#br# [J]. Journal of Civil Aviation University of China, 2019, 37(4): 1-4.
[3]	XU Meng, XI Zexi, WANG Yongyun, LI Xiaolu. Aero-engine fault diagnosis based on ensemble learning algorithm [J]. , 2019, 37(2): 29-33,42.
[4]	CAO Huiling, GAO Sheng, KAN Yuxiang. Influence of training data in engine fault diagnosis based on Adaboost [J]. , 2018, 36(6): 16-20.
[5]	ZHONG Xiang, HAN Xu, ZHU Caiyun, WANG Xiaomeng. Airport passenger grouping model based on dichotomic K-means algorithm [J]. , 2018, 36(3): 37-40.
[6]	ZHAO Hongli, CHEN Minkai, LIU Shixiong. Prediction of engine removal time and optimization of engine fleet removal plan [J]. , 2018, 36(2): 16-19,27.
[7]	YU Hui, ZHANG Ming, YU Jue. Low altitude rescue demand forecasting based on K-means CBR [J]. , 2017, 35(2): 55-59.

Text clustering method based on weighted LDA model

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 7

Recommended Articles

Metrics

Comments