面向图片数据的混凝土材料文本智能识别与分析Intelligent recognition and analysis of concrete material text based on image data
邓旭方,刘乐平,陈正虎,钟恒,吕沅庚,封婧仪
摘要(Abstract):
在混凝土坝建设过程中,产生了大量以非结构化文本表达的材料信息,对工程质量检测与材料进一步研发具有重要意义。受数据管理技术限制,存在大量以图片形式存储的材料文本数据,难以直接编辑与利用,无法满足混凝土材料数据智能分析与管理的需求。此外,针对海量的材料文本数据,目前缺乏智能的信息提取机制,难以高效获取文本中的关键信息。因此,提出了基于图片数据的混凝土材料文本智能解译方法,识别图片数据中的文本信息,提高了倾斜材料文本的检测与识别效率。以解译的图片数据为基础,从多角度文本特征关系出发,以MMR算法为框架,结合BERT模型以及TF-IDF算法,考虑文本语义与专业术语的重要性,建立了一套混凝土材料文本智能分析技术,提取混凝土材料文本中的关键信息。以实际混凝土材料文本为基础,该方法提取关键词的准确率为86.67%,优于其他常用的关键词提取模型。研究成果为混凝土材料不可编辑文本数据的处理提供了一种新的方法,有助于提升混凝土材料数据智能化管理水平。
关键词(KeyWords): 混凝土坝;材料数据;文本检测;智能识别;关键信息
基金项目(Foundation): 中国长江电力股份有限公司科研项目(Z212302036)
作者(Author): 邓旭方,刘乐平,陈正虎,钟恒,吕沅庚,封婧仪
DOI: 10.13928/j.cnki.wrahe.2025.S1.014
参考文献(References):
- [1] 田育功.大坝与水工混凝土关键核心技术综述[J].华北水利水电大学学报(自然科学版),2018,39(5):23-30.
- [2] 李庆斌,马睿,胡昱,等.大坝智能建造研究进展与发展趋势[J].清华大学学报(自然科学版),2022,62(8):1252-1269.
- [3] 何殷鹏,张梦溪,李文伟,等.金沙江下游水电站数字混凝土研究与应用[J].水力发电学报,2022,41(10):1-17.
- [4] DROBAC S,LINDEN K.Optical character recognition with neural networks and post-correction with finite state methods[J].International Journal on Document Analysis and Recognition,2020,23(4):279-295.
- [5] 关世奎.基于计算机视觉的智能辅助阅卷系统设计与开发[D].北京:北京工业大学,2020.
- [6] 张宇.基于深度学习的病案识别与分析[D].北京:北京邮电大学,2021.
- [7] ZHU L,SHENG X.On image-processing-based identification method of express logistics information[J].Traitement Du Signal,2022,39(3):1019-1025.
- [8] ONAN A,KORUKOGLU S,BULUt H.Ensemble of keyword extraction methods and classifiers in text classification[J].Expert Systems with Applications,2016,57:232-247.
- [9] 郭庆.基于图与LDA的中文文本关键词提取算法[D].北京:北京邮电大学,2019.
- [10] 张磊,陈晶,项学智,等.结合关键词混淆网络的关键词检出系统[J].智能系统学报,2010,5(5):432-435.
- [11] 李明超,吕沅庚,田丹,等.基于改进LDA的水电工程进度管理文本智能分析[J].水力发电学报,2022,41(3):133-141.
- [12] PAN X,ZHONG B T,WANG Y H,et al.Identification of accident-injury type and bodypart factors from construction accident reports:A graph-based deep learning framework[J].Advanced Engineering Informatics,2022,54:101752.
- [13] PALANGI H,DENG L,SHEN Y L,et al.Deep sentence embedding using long short-term memory networks:Analysis and application to information retrieval[J].IEEE-ACM Transactions on Audio Speech and Language Processing,2016,24(4):694-707.
- [14] HUANG Z X,XIE Z P.A patent keywords extraction method using TextRank model with prior public knowledge[J].Complex & Intelligent Systems,2022,8(1):1-12.
- [15] XIONG A,LIU D R,TIAN H K,et al.News keyword extraction algorithm based on semantic clustering and word graph model[J].Tsinghua Science and Technology,2021,26(6):886-893.
- [16] ZHONG B T,PAN X,LOVE P E D,et al.Hazard analysis:A deep learning and text mining framework for accident prevention[J].Advanced Engineering Informatics,2020,46:101152.
- [17] TONG G F,LI Y,GAO H S,et al.MA-CRNN:A multi-scale attention CRNN for Chinese text line recognition in natural scenes[J].International Journal on Document Analysis and Recognition,2020,23(2):103-114.
- [18] 林晓蕊.自然场景下的文本检测算法研究[D].成都:电子科技大学,2021.
- [19] 潘毓生.基于图像处理的智能电表读数识别方法的研究[D].天津:天津理工大学,2021.
- [20] AHMAD R,NAZ S,RAZZAK I.Efficient skew detection and correction in scanned document images through clustering of probabilistic hough transforms[J].Pattern Recognition Letters,2021,152:93-99.
- [21] DUNG C V,SEKIYA H,HIRANO S,et al.A vision-based method for crack detection in gusset plate welded joints of steel bridges using deep convolutional neural networks[J].Automation in Construction,2019,102:217-229.
- [22] RUI L,TANG X S,HAO K R.DB-NMS:improving non-maximum suppression with density-based clustering[J].Neural Computing & Applications,2022,34(6):4747-4757.
- [23] ZHANG D L,LI M C,TIAN D,et al.Intelligent text recognition based on multi-feature channels network for construction quality control[J].Advanced Engineering Informatics,2022,53:101669.
- [24] ROGERS A,KOVALEVA O,RUMSHISKY A.A primer in BERTology:What we know about how BERT works[J].Transactions of the Association for Computational Linguistics,2020,8:842-866.
- [25] 钟锦燕.基于深度学习的文本分类研究[D].成都:电子科技大学,2020.
- [26] 杨飘,董文永.基于BERT嵌入的中文命名实体识别方法[J].计算机工程,2020,46(4):40-45.
- [27] LI R,MO T J,YANG J X,et al.Bridge inspection named entity recognition via BERT and lexicon augmented machine reading comprehension neural model[J].Advanced Engineering Informatics,2021,50:101416.
- [28] FANG W L,LUO H B,XU S J,et al.Automated text classification of near-misses from safety reports:An improved deep learning approach[J].Advanced Engineering Informatics,2020,44:101060.
- [29] 罗玲,李硕凯,何清,等.基于知识图谱、TF-IDF和BERT模型的冬奥知识问答系统[J].智能系统学报,2021,16(4):819-826.
- [30] 彭俊利,谷雨,张震,等.融合单词贡献度与Word2Vec词向量的文档表示[J].计算机工程,2021,47(4):62-67.
- [31] WANG Z H,WANG D,LI Q.Keyword extraction from scientific research projects based on SRP-TF-IDF[J].Chinese Journal of Electronics,2021,30(4):652-657.
- [32] CHEN Y,WANG J,LI P,et al.Single document keyword extraction via quantifying higher-order structural features of word co-occurrence graph[J].Computer Speech and Language,2019,57:98-107.
- [33] LIU J,CHEN Y,CHEN Y.Emergency and disaster management-crowd evacuation research[J].Journal of Industrial Information Integration,2021,21:100191.
- [34] 梁梦英,李德玉,王素格,等.Senti-PG-MMR:多文档游记情感摘要生成方法[J].中文信息学报,2022,36(3):128-135.
- [35] NGUYEN K C,NGUYEN C T,NAKAGAWA M.Nom document digitalization by deep convolution neural networks[J].Pattern Recognition Letters,2020,133:8-16.