首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多模态图像的自然环境下油茶果识别
引用本文:周宏平,金寿祥,周磊,郭自良,孙梦梦.基于多模态图像的自然环境下油茶果识别[J].农业工程学报,2023,39(10):175-182.
作者姓名:周宏平  金寿祥  周磊  郭自良  孙梦梦
作者单位:南京林业大学机械电子工程学院, 南京 210037
基金项目:国家林业和草原局应急科技项目(202202-3)
摘    要:针对自然条件下油茶果生长条件复杂,存在大量遮挡、重叠的问题,提出了一种基于RGB-D(red green blue-depth)多模态图像的双主干网络模型YOLO-DBM(YOLO-dual backbone model),用来进行油茶果的识别定位。首先,在YOLOv5s模型主干网络CSP-Darknet53的基础上设计了一种轻量化的特征提取网络。其次,使用两个轻量化的特征提取网络分别提取彩色和深度特征,接着使用基于注意力机制的特征融合模块将彩色特征与深度特征进行分级融合,再将融合后的特征层送入特征金字塔网络(feature pyramid network,FPN),最后进行预测。试验结果表明,使用RGB-D图像的YOLO-DBM模型在测试集上的精确率P、召回率R和平均精度AP分别为94.8%、94.6%和98.4%,单幅图像平均检测耗时0.016 s。对比YOLOv3、YOLOv5s和YOLO-IR(YOLO-InceptionRes)模型,平均精度AP分别提升2.9、0.1和0.3个百分点,而模型大小仅为6.21MB,只有YOLOv5s大小的46%。另外,使用注意力融合机制的YOLO-DBM模型与只使用拼接融合的YOLO-DBM相比,精确率P、召回率R和平均精度AP分别提高了0.2、1.6和0.1个百分点,进一步验证该研究所提方法的可靠性与有效性,研究结果可为油茶果自动采收机的研制提供参考。

关 键 词:图像识别  深度学习  模型  油茶果  多模态  多尺度融合
收稿时间:2023/3/9 0:00:00
修稿时间:2023/4/10 0:00:00

Recognition of camellia oleifera fruits in natural environment using multi-modal images
ZHOU Hongping,JIN Shouxiang,ZHOU Lei,GUO Ziliang,SUN Mengmeng.Recognition of camellia oleifera fruits in natural environment using multi-modal images[J].Transactions of the Chinese Society of Agricultural Engineering,2023,39(10):175-182.
Authors:ZHOU Hongping  JIN Shouxiang  ZHOU Lei  GUO Ziliang  SUN Mengmeng
Institution:College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China
Abstract:An accurate and rapid identification can greatly contribute to the automated harvesting of Camellia oleifera fruits. However, Camellia oleifera grown in the natural environment has the dense branches and leaves, severely obstructed fruits, leading to the overlapping fruits. Only RGB images cannot fully meet the required effectiveness of the fruit recognition in modern agriculture. In this study, a dual backbone network model was proposed to combine the Red Green Blue-Depth (RGB-D) multi-modal images for the recognition and localization of Camellia oleifera fruits. Firstly, the lightweight improved YOLOv5s model was selected to detect the Camellia oleifera fruit targets. The YOLO-IR (YOLO-InceptionRes) was introduced the InceptionRes module into a feature extraction network for the multi-scale information fusion using four convolution operations of different sizes and concatenation. At the same time, the FPN (Feature Pyramid Network) + PAN (Path Aggregation Network) module of YOLOv5s was simplified into an FPN module to reduce the network complexity. Furthermore, the depth and width of the model were compressed to limit the model size for the smaller number of model parameters. The improved YOLO-IR was achieved in an average progress AP decrease of 0.2 percentage points, compared with the YOLOv5s, but the model size decreased by 69%. Provide support for building A lightweight dual backbone model was provided for the building support. Secondly, a dual backbone detection of Camellia oleifera fruit object, YOLO-DBM (YOLO-Dual Backbone Model) was constructed with the RGB-D images, according to the YOLO-IR. Two feature extraction networks were the same as the YOLO-IR to extract the color and depth features. An attention mechanism was constructed with the feature fusion module to fuse the color and depth features, Hierarchical fusion of color features and depth features at different scales. The attention module consisted of the spatial and channel attention mechanism. Specifically, the spatial attention mechanism was used to increase the weight of effective regions in the deep feature layer, but to reduce the interference of deep holes. Then, it was concatenated with the RGB feature layer. As such, the channel attention mechanism was used to emphasize the contribution of effective channels in the fused feature layer. Finally, the fused feature layer was input into the prediction module for the prediction. The experimental results show that the accuracy P, recall R, and average accuracy AP of the YOLO-DBM model using RGB-D images on the test set were 94.8%, 94.6%, and 98.4%, respectively. The average detection time for a single image was 0.016s. Compared with the YOLOv3, YOLOv5s, and YOLO-IR models, the average accuracy of AP was improved by 2.9, 0.1, and 0.3 percentage points, respectively, while the model size was only 6.21MB, which was only 46% of the YOLOv5s size. In addition, the accuracy P, recall R, and average accuracy AP increased by 0.2, 1.6, and 0.1 percentage points, respectively, compared with the YOLO-DBM model with the attention fusion module and the YOLO-DBM model with splicing fusion. The high effectiveness was also verified for the dual backbone network and attention fusion module. The finding can provide a strong reference and a new approach for the fruit recognition tasks in the oil tea fruit automatic harvesters.
Keywords:image recognition  deep learning  models  camellia oleifera  multi-modal  multi-scale fusion
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号