中国民航大学学报 ›› 2023, Vol. 41 ›› Issue (2): 14-20.

• 民用航空 • 上一篇    下一篇

基于大感知域LSTM-Seq2Seq模型的代码缺陷检测方法

王鹏1a,2,姚鑫鹏1a,2,汪克念1a,2,陈文琪1b,2,陈曦1a,2   

  1. (1.中国民航大学a.安全科学与工程学院;b.中欧航空工程师学院,天津300300;2.民航航空器适航审定技术重点实验室,天津300300)
  • 收稿日期:2021-12-21 修回日期:2022-03-14 出版日期:2023-10-28 发布日期:2023-10-28
  • 作者简介:王鹏(1982—),男,天津人,研究员,博士,研究方向为民机系统安全性设计与评估、机载电子硬件适航审定.
  • 基金资助:
    国家自然科学基金民航联合研究基金项目(U1933106)

Detection method for code defect based on LSTM-Seq2Seq model with large perception

WANG Peng 1a, 2 , YAO Xinpeng 1a, 2 , WANG Kenian1a, 2 , CHEN Wenqi 1b, 2 , CHEN Xi 1a, 2   

  1. (1a. College of Safety Science and Engineering, 1b. Sino-European Institute of Aviation Engineering, CAUC, Tianjin 300300, China; 2. Key Lab of Civil Aircraft Airworthiness Technology, Tianjin 300300, China)
  • Received:2021-12-21 Revised:2022-03-14 Online:2023-10-28 Published:2023-10-28

摘要: 针对现有基于深度神经网络的代码缺陷检测方法无法分析缺陷特征并输出相关评审建议的问题,提出一种基于大感知域LSTM-Seq2Seq模型的代码缺陷检测方法。首先,使用长短期记忆网络(LSTM,longshorttermmemory)学习缺陷代码的编码特征,建立缺陷判别模型。其次,针对模型与数据集不匹配的问题,向序列到序列模型(Seq2Seq,sequencetosequence)引入代码段长度系数,提升模型对代码评审任务的适用度;通过建立代码缺陷特征与评审建议特征间的映射关系建立了代码分析模型,实现评审输出功能。最后,利用公开数据集SARD对该方法进行了验证,该方法在准确率、召回率、F1值方面的测试结果分别为92.50%、87.20%、87.60%,典型代码缺陷输出的评审文本与专家评审的文本相似度为85.99%,可有效减少评审过程对专家经验的依赖。

关键词: font-size:15.04px, ">缺陷检测, 代码评审, 长短期记忆网络(LSTM), 序列到序列模型(Seq2Seq)

Abstract:

Aiming at the problem that the existing deep neural network based code defect detection methods cannot analyze the defect characteristics and output relevant review suggestions, a code defect detection method based on LSTM-Seq2Seq model with large perception is proposed. Firstly, the long short-term memory network (LSTM) is applied to obtain the coding characteristics of defective code and establish a defect identification model.

Secondly, aiming at the mismatch between model and dataset, the code segment length coefficient is introduced into the sequence to sequence (Seq2Seq) model to improve the model applicability to the code review task. To realize the review output function, the code analysis model is constructed by establishing the mapping relationship between the features of code defect and the review recommendation. Finally, the method is verified by the open data set of SARD. The results show that the accuracy rate, recall rate and F1 vale of the proposed method are 92.50%, 87.20% and 87.60% respectively, and the similarity between the output review of typical code defect and the expert review is 85.99%, which can effectively reduce the dependence on expert experience in the review process

Key words: font-size:15.04px, ">defect detection, code review, long short-term memory (LSTM), sequence to sequence (Seq2Seq)

中图分类号: