首页 | 本学科首页   官方微博 | 高级检索  
     检索      

细粒度图像识别任务的多层和区域特征融合模型
引用本文:刘宇泽,孙涵,李明洋,李明心,康巨涛,王恩浩.细粒度图像识别任务的多层和区域特征融合模型[J].中国农机化学报,2023,44(1):199-207.
作者姓名:刘宇泽  孙涵  李明洋  李明心  康巨涛  王恩浩
作者单位:1. 南京航空航天大学计算机科学与技术学院,南京市,211100;
2. 伊利诺伊大学厄巴纳—香槟分校Grainger工程学院,美国伊利诺伊州,61801
基金项目:南京航空航天大学2021校级大学生创新创业训练计划项目(2021CX016013)
摘    要:细粒度图像识别任务中,在整体相似度极大而局部细节不同的图片中提取具有关注度的区域,并对其中的特征加以学习是至关重要的任务。针对目前研究中存在的人工标注判别区域的成本太高、模型构建中需引入大量额外的网络结构,在训练和推理阶段会引入额外的计算开销等问题,研究优化后提出多层和区域特征融合模型。模型基于注意力机制进行构建,模拟人类观察原理,提升对有价值的局部细节的关注能力,提高在经典数据集上的识别效果。本模型主要分为带有注意力权重的卷积神经网络多层融合和基于区域特征之间依赖性的区域融合两个部分。整体主要以注意力机制为主,注重特征提取时全面考虑图像细节特征和抽象特征以及对于不同区域的组成与各个区域之间的依赖关系,在兼顾整体的情况下同时发挥局部细节的影响力。试验结果表明:在部分经典数据集上具有良好的准确率,Oxford Flowers数据集准确率为95.69%,同时在AID(航拍图像)数据集上具有96.96%的准确率,此前没有任何模型在该数据集上有过相关研究和模型训练。

关 键 词:细粒度图像识别  注意力机制  卷积神经网络  特征提取  特征融合

Multi layer and regional feature integration models for fine grained visual classification
Abstract:In fine grained visual classification tasks, it is very important to extract areas of interest from images with great overall similarity but different local details and to learn their features. In view of the existing problems in the current research, such as the high cost of manual labeling discriminating regions, the need to introduce a large number of additional network structures in the model construction, and the additional computational overhead in the training and reasoning period, the multi layer and regional feature integration model was proposed after optimization. The model is constructed based on the attention mechanism to simulate the principle of human observation, improve the ability to pay attention to valuable local details, and improve the recognition effect on classical data sets. This model is mainly divided into two parts: the multi layer fusion of convolutional neural network with attention weight and the region fusion based on the dependence of regional features. The overall focus is mainly on the attention mechanism, which pays attention to the comprehensive consideration of the detail features and abstract features of the image, as well as the dependence between the composition of different regions and each region, and gives play to the influence of local details while taking into account the whole. The experimental results show that it has good accuracy in some classical data sets, such as 95.69% in Oxford Flowers data set and 96.96% in AID(aerial image) data set. No model has been studied and trained on this data set before.
Keywords:fine grained image recognition  attention mechanism  convolutional neural network  feature extraction  feature fusion  
点击此处可从《中国农机化学报》浏览原始摘要信息
点击此处可从《中国农机化学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号