首页 | 官方网站   微博 | 高级检索  
     

基于BERT字向量和TextCNN的农业问句分类模型分析
引用本文:鲍彤,罗瑞,郭婷,贵淑婷,任妮.基于BERT字向量和TextCNN的农业问句分类模型分析[J].南方农业学报,2022,53(7):2068-2076.
作者姓名:鲍彤  罗瑞  郭婷  贵淑婷  任妮
作者单位:1 江苏省农业科学院信息中心, 江苏南京 210014;2 江苏大学科技信息研究所, 江苏镇江 212013
基金项目:国家社会科学基金项目(19BTQ032)
摘    要:【目的】研究不同词向量和深度学习模型组合对农业问句分类结果的影响,为构建农业智能问答系统提供技术支撑。【方法】通过爬虫获取农业种植网等网站的问答数据,选择20000条问句进行人工标注,构建农业问句分类语料库。采用BERT对农业问句进行字符编码,利用文本卷积神经网络(TextCNN)提取问句高维度特征对农业问句进行分类。【结果】在词向量对比实验中,BERT字向量与TextCNN结合时农业问句分类F1值达93.32%,相比Word2vec字向量提高2.1%。在深度学习模型的分类精度对比方面,TextCNN与Word2vec和BERT字向量结合的F1值分别达91.22%和93.32%,均优于其他模型。在农业问句的细分试验中,BERT-TextCNN在栽培技术、田间管理、土肥水管理和其他4个类别中分类F1值分别为86.06%、90.56%、95.04%和85.55%,均优于其他深度学习模型。超参数设置方面,BERT-TextCNN农业问句分类模型卷积核大小设为[3,4,5]、学习率设为5e-5、迭代次数设为5时效果最优,该模型在数据样本不均衡的情况下,对于农业问句的平均分类准确率依然能达93.00%以上,可满足农业智能问答系统的问句分类需求。【建议】通过阿里NLP等开源平台提升数据标注质量;在分类过程中补充词频和文档特征,提高模型分类精度;农业相关政府职能部门加强合作,积极探索农业技术数字化推广和服务新模式。

关 键 词:农业问句    智能问答系统    问句分类    预训练语言模型(BERT)    文本卷积神经网络
收稿时间:2021-11-29

Agricultural question classification model based on BERT word vector and TextCNN
BAO Tong,LUO Rui,GUO Ting,GUI Shu-ting,REN Ni.Agricultural question classification model based on BERT word vector and TextCNN[J].Journal of Southern Agriculture,2022,53(7):2068-2076.
Authors:BAO Tong  LUO Rui  GUO Ting  GUI Shu-ting  REN Ni
Affiliation:1 Information Center, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu 210014, China;2 Institute of Science and Technology Information, Jiangsu University, Zhenjiang, Jiangsu 212013, China
Abstract:【Objective】To study the effects of different word vectors and deep learning models on the classification results of agricultural questions, so as to provide technical support for the construction of agricultural intelligent question answering system.【Method】The question-and-answer data from websites such as the Agricultural Planting Network was obtained through crawlers, and 20 thousand questions were selected for artificial annotation to construct the classification corpus of agricultural questions. Bidirectional encoder representation from transformers(BERT) was used to encode agricultural questions, and text convolutional neural network(TextCNN) was used to extract high-dimensional features of questions to classify agricultural questions.【Result】In the word vector comparison experiment, when BERT word vector was combined with TextCNN, the F1 value of agricultural question classification reached 93.32%, which was 2.1% higher than that of Word2vec. In the comparison of classification accuracy of deep learning models, when TextCNN was combined with Word2vec and BERT, F1 value reached 91.22% and 93.32%, respectively, which were better than that of other models. In the subdivision experiment of agricultural questions, F1 values of BERT-TextCNN in the classification of cultivation technology, field management, soil, fertilizer and water management achieved 86.06%, 90.56%, 95.04% and 85.55%, which were better than that in other deep learning models. In terms of hyperparameter settings, the BERTTextCNN agricultural question classification model had the best effect when the convolution kernel size is set as3, 4, 5], the learning rate was set to 5e-5, and the number of iterations was set to 5. In the case of unbalanced data samples, the average classification accuracy of agricultural questions could still reach more than 93.00%, which could meet the question classification requirements of the agricultural intelligent question answering system.【Suggestion】The quality of data annotation can be improved through open source platforms such as Ali NLP;model classification accuracy shall be improved through supplementing word frequency and document features in the classification process;Agricultural-related government departments need to strengthen cooperation to explore new models of popularization and service of agricultural technology digitalization.
Keywords:
点击此处可从《南方农业学报》浏览原始摘要信息
点击此处可从《南方农业学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号