首页 | 本学科首页   官方微博 | 高级检索  
     检索      

融合新词发现和改进TextRank的农业领域关键词提取算法
引用本文:邸小康.融合新词发现和改进TextRank的农业领域关键词提取算法[J].农业工程,2023,13(6).
作者姓名:邸小康
作者单位:北京市数字农业农村促进中心
摘    要:针对农业领域文本中专业术语类关键词提取困难的问题,本文提出了一种融合新词发现和改进TextRank算法的农业领域关键词提取方法,该算法通过信息熵对文本中的词进行成词概率计算,以此发现领域专有名词和新词,通过人工审核扩充分词字典;同时,在分词字典基础上,改进TextRank算法在词图构建中节点值的计算方法,添加词语位置和词性权重,利用词语综合权重提取文本关键词。通过实验对比,本研究的算法在F值上比传统的TF-IDF算法平均提高了7.5%,比TextRank算法平均提高了9.8%,具有一定的实用性。

关 键 词:关键词提取  新词发现  信息熵  排序算法
收稿时间:2022/11/15 0:00:00
修稿时间:2023/2/3 0:00:00

Research on Agricultural Keyword Extraction Algorithm Combining New Word Discovery and Improved TextRank
dixiaokang.Research on Agricultural Keyword Extraction Algorithm Combining New Word Discovery and Improved TextRank[J].Agricultural Engineering,2023,13(6).
Authors:dixiaokang
Institution:Beijing Digital Agriculture Rural Promotion Center
Abstract:Aiming at the difficulty of agricultural keyword extraction in domain text, this paper proposes a agricultural keyword extraction method which combines new word discovery and improved TextRank algorithm. The algorithm calculates the word formation probability of the words in the text through information entropy to find domain proper nouns and new words, and expands the word segmentation dictionary through manual audit; At the same time, based on the word segmentation dictionary, this paper improves the calculation method of TextRank algorithm node value in the construction of word graph, adds word position and part of speech weight, and uses the comprehensive weight of words to extract text keywords. Through the experimental comparison, the algorithm in the F value is 7.5% higher than traditional TF-IDF algorithm on average, and 9.8% higher than the TextRank algorithm on average. The algorithm had certain practicability.
Keywords:keyword extraction  new word discovery  information entropy  sorting algorithm
点击此处可从《农业工程》浏览原始摘要信息
点击此处可从《农业工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号