基于DNN-HMM的陆空通话声学模型构建方法#br#

中国民航大学学报 ›› 2019, Vol. 37 ›› Issue (4): 36-40.

基于DNN-HMM的陆空通话声学模型构建方法#br#

杨金锋，李凯涛，贾桂敏，师一华

（中国民航大学天津市智能信号与图像处理重点实验室，天津300300）

出版日期:2019-08-23 发布日期:2020-04-01
作者简介:杨金锋（1971—），男，河南周口人，教授，博士，研究方向为图像处理、生物识别、计算机视觉.
基金资助:
国家自然科学基金项目（U1433120，61502498，61379102）；中央高校基本科研业务费专项（3122017001）

Acoustic model building of radiotelephony communication based on DNN-HMM#br#

YANG Jinfeng, LI Kaitao, JIA Guimin, SHI Yihua#br#

（Intelligent Signal and Image Processing Key Lab of Tianjin, CAUC, Tianjin 300300, China）

Online:2019-08-23 Published:2020-04-01

摘要/Abstract

摘要： 由于陆空通话特殊的语法结构与发音，通用语音识别声学模型不适用于陆空通话的声学建模。提出一种基于深度学习的民航陆空通话声学模型构建方法。基于建立的陆空通话语料库数据，利用DNN-HMM 模型对陆空通话语音特征进行声学建模，并通过语音特征增强方法提高模型输入特征的鲁棒性。通过实验对比分析不同语音特征、特征维数和连接帧数对陆空通话声学模型的影响。实验结果表明，提出的基于DNNHMM的陆空通话声学模型可使音素错误率降低至5.62%。

关键词: 陆空通话, 声学模型, DNN-HMM, 特征增强

Abstract: Due to the grammatical structure and pronunciation of radiotelephony communication, the acoustic model of generic speech recognition is not applicable to radiotelephony communication. An acoustic model constructing method of radiotelephony communication is proposed based on deep learning. DNN-HMM is used to extract the acoustic features. A professional database is built for model training, speech feature enhancing method is applied to improve the robustness of input speech features. Experiments are conducted to compare the influence of different speech features, feature dimensions and frame link sizes on the proposed model. Results show that the proposed acoustic model based on DNN-HMM is effective and the phoneme error rate can be reduced to 5.62%.

Key words: radiotelephony communication, acoustic model, DNN-HMM, feature enhancement

中图分类号:

V355.2
TP391

杨金锋, 李凯涛, 贾桂敏, 师一华. 基于DNN-HMM的陆空通话声学模型构建方法#br#[J]. 中国民航大学学报, 2019, 37(4): 36-40.

YANG Jinfeng, LI Kaitao, JIA Guimin, SHI Yihua. Acoustic model building of radiotelephony communication based on DNN-HMM#br#[J]. Journal of Civil Aviation University of China, 2019, 37(4): 36-40.

参考文献 18

[1]	MACRAE C. Close calls: Managing risk and resilience in airline flight
[1]	MACRAE C. Close calls: Managing risk and resilience in airline flight safety[M]. New York City: Palgrave Macmillan, 2014.
[2]	潘卫军,吴量,陈华群,等.空中交通无线电陆空对话错误分析[J].中国西部科技, 2008, 7(30): 1-3.
[3]	刘继新.民航无线电陆空对话[M].北京:国防工业出版社, 2014.
[4]	刘万凤,胡军,袁伟伟.陆空通话标准用语(英语)的语音指令识别技术研究[J].计算机科学, 2013, 40(7): 132-137.
[5]	ZHANG SHANSHAN, ZHANG CE, YOU ZHAO, et al. Asynchronous stochastic gradient descent for DNN training[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver,May 26-31, 2013: 6660-6663.
[6]	AREL I, ROSE DC, KARNOWSKI T P. Deep machine learning: A new frontier in artificial intelligence research[J]. IEEE Computational Intelligence Magazine, 2010, 5(4): 13-18.
[7]	YOSHIOKA T, GALES M J F. Environmentally robust ASR front-end for deep neural network acoustic models[J]. Computer Speech & Language,2015, 31(1): 65-86.
[8]	LI LONGFEI, ZHAO YONG, JIANG DONGMEI, et al. Hybrid deep neural network-hidden markov model (DNN-HMM) based speech emotion recognition[C]//2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Sep 2-5, IEEE, 2013:312-317.
[9]	ZHAO YONG, JUANG BIINGHWANG. Stranded gaussian mixture hidden Markov models for robust speech recognition[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing
	（ICASSP）, Kyoto, Mar 25-30, 2012: 4301-4304.
[10]	梅俊杰. 基于卷积神经网络的语音识别研究[D]. 北京: 北京交通大学, 2017.
[11]	SAK H, SENIOR A, BEAUFAYS F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//15th Annual Conference of the International Speech Communication Association,Singapore, Sep 14-18, 2014: 338-342.
[12]	CAO JIUWEN, ZHAO TUO, WANG JIANZHONG, et al. Excavation equipment classification based on improved MFCC features and ELM[J]. Neurocomputing, 2017, 261: 231-241.
[13]	傅忠谦. 复杂环境非特定人语音识别方法研究[D]. 合肥: 中国科学技术大学, 2011.
[14]	陈琦, 张文林, 牛铜, 等. 一种基于RBM 的深层神经网络音素识别方法[J].信息工程大学学报, 2013, 14(5): 569-574.
[15]	周盼. 基于深层神经网络的语音识别声学建模研究[D]. 合肥: 中国科学技术大学, 2014.
[16]	鲁泽茹.连续语音识别系统的研究与实现[D]. 杭州: 浙江工业大学,2016
[17]	MIKOLOV T, DEORAS A, POVEY D, et al. Strategies for training large scale neural network language models[C]//2011 IEEE Workshop on Automatic Speech Recognition & Understanding, waikoloa, HI, Dec 11-15,IEEE, 2011: 196-201.
[18]	包叶波,胡郁, 刘聪, 等. 中文连续语音识别系统音素建模单元集的构建[J]. 清华大学学报(自然科学版), 2018, 51(9): 1288-1292,1297.