基于相机位姿恢复与神经辐射场理论的果树三维重建方法 3D reconstructing fruit tree using camera pose recovery and neural radiance fields theory期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于相机位姿恢复与神经辐射场理论的果树三维重建方法

引用本文：	宫金良,刘镔霄,魏鹏,王辉岩,张彦斐,兰玉彬.基于相机位姿恢复与神经辐射场理论的果树三维重建方法[J].农业工程学报,2023,39(22):157-165.

作者姓名：	宫金良刘镔霄魏鹏王辉岩张彦斐兰玉彬

作者单位：	山东理工大学机械工程学院, 淄博 255000;中国农业科学院植物保护研究所, 北京 100193;山东理工大学农业工程与食品科学学院, 淄博 255000

基金项目：	山东省引进顶尖人才“一事一议”专项经费资助项目（鲁政办字[2018]27号）；山东省重点研发计划（重大科技创新工程）项目（2020CXGC010804）；山东省自然科学基金（ZR2021MC026）。

摘要：	针对传统立体视觉三维重建技术难以准确表征果树多尺度复杂表型细节的问题，该研究提出了一种基于相机位姿恢复技术与神经辐射场理论的果树三维重建方法，设计了一套适用于标准果园环境的果树图像采集设备和采集方案。首先，环绕拍摄果树全景视频并以抽帧的方式获取果树多视角图像；其次，使用运动结构恢复算法进行稀疏重建以计算果树图像位姿；然后，训练果树神经辐射场，将附有位姿的多视角果树图像进行光线投射法分层采样和位置编码后输入多层感知机，通过体积渲染监督训练过程以获取收敛且能反映果树真实形态的辐射场；最后，导出具有高精度与高表型细节的果树三维实景点云模型。试验表明，该研究构建的果树点云能准确表征从植株尺度的枝干、叶冠等宏观结构到器官尺度的果实、枝杈、叶片乃至叶柄、叶斑等微观结构。果树整体精度达到厘米级，其中胸径、果径等参数达到毫米级精度，尺度一致性误差不超过5%。相较于传统的立体视觉三维重建方法，重建时间缩短39.50%，树高、冠幅、胸径和地径4个树形参数的尺度一致性误差分别降低了77.06%、83.61%、45.47%和62.23%。该方法能构建具有高精度、高表型细节的果树点云模型，为数字果树技术的应用奠定基础。
关键词：	计算机视觉三维重建果树神经辐射场位姿恢复
收稿时间：	2023/7/29 0:00:00
修稿时间：	2023/11/7 0:00:00
3D reconstructing fruit tree using camera pose recovery and neural radiance fields theory

GONG Jinliang,LIU Binxiao,WEI Peng,WANG Huiyan,ZHANG Yanfei,LAN Yubin.3D reconstructing fruit tree using camera pose recovery and neural radiance fields theory[J].Transactions of the Chinese Society of Agricultural Engineering,2023,39(22):157-165.

Authors:	GONG Jinliang LIU Binxiao WEI Peng WANG Huiyan ZHANG Yanfei LAN Yubin

Affiliation:	School of Mechanical Engineering, Shandong University of Technology, Zibo 255000, China;Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China;School of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo 255000, China

Abstract:	An accurate characterization is required for the multi-scale and complex phenotypic details of fruit trees using three-dimensional reconstruction in stereo vision. In this study, 3D reconstruction was proposed for the fruit tree using camera pose recovery and neural radiance field theory. Data acquisition of fruit tree images was designed suitable for standard orchard environments. The collection scheme was to capture the videos from the multiple heights around the fruit tree and then to extract the frames for the images. Firstly, the panoramic video was captured surrounding the fruit tree to obtain the multi-angle images using frame extraction. A three-axis stabilized gimbal camera was also utilized to capture the videos from multiple heights around the fruit tree and frame extraction. Secondly, the motion structure recovery was employed for the sparse reconstruction to calculate the pose of the fruit tree images. Subsequently, the neural radiance field was trained for the fruit tree. The images with poses were pre-processed to the ray projection, layered sampling, and position encoding. A multi-layer perceptron was then used for the training under the volume rendering supervision. A converged radiance field was represented by the true morphology of the fruit tree. Finally, the 3D point cloud model of the fruit tree was exported with high accuracy and detailed phenotypic features. Experimental results indicate that the collection scheme was more efficient in obtaining the multi-angle videos of the fruit tree. The stability of the fruit tree video frames was enhanced by combining the stabilized gimbal and digital images in both hardware and software, resulting in high-quality images from frame extraction. There was a significant improvement in the image acquisition speed, compared with the traditional stereo vision 3D reconstruction using handheld cameras. The average reprojection error in the Structure from Motion (SfM) sparse reconstruction stage was only 0.84766 pixels, with a mean trajectory length of 5.85104 for the 3D points. The average number of feature points observed per image was approximately 1,601.87, with 600 camera poses, and a 100% success rate in camera pose recovery. In the NeRF scene training stage, the neural radiance field training process was taken as 30,000 steps over 1,109.69 seconds, in order to generate approximately 1.2×10⁵ rays per second. The scene representation was stabilized after 10,000 steps. Some parameters were gradually converged, such as learning rate and training loss, where the PSNR index was stabilized between 22~23 dB. The fruit tree point cloud accurately represented the macroscopic structures, such as branches and canopies at the plant scale, as well as microscopic structures including fruits, branches, leaves, and even leaf stems and spots at the organ scale. The overall accuracy of the model reached centimeter-level precision, with the scale consistency and color consistency accuracy generally exceeding 97%. Specific indicators (such as breast diameter and fruit diameter) were achieved in the millimeter-level precision, with the scale consistency errors exceeding 5%, and the color consistency accuracy reaching above 95%. The reconstruction time was reduced by 39.5%, and the errors in tree height, crown width, breast diameter, and ground diameter were reduced by 77.06%, 83.61%, 45.47%, and 62.23%, respectively, compared with the traditional SfM-MVS. Errors in hue, saturation, and brightness were also reduced by 20.88%, 99.85%, and 91.39%, respectively. This improved model can be expected to construct the point cloud of fruit trees with high accuracy and detailed phenotypic features. The finding can provide a strong reference for the various applications in the digital system of fruit trees.

Keywords:	computer vision 3D reconstruction fruit trees neural radiance fields camera pose recovery

	点击此处可从《农业工程学报》浏览原始摘要信息
	点击此处可从《农业工程学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏