基于UIE与改进Apriori的大型工程隐患危险源抽取与知识挖掘方法Mega project safety hazards entities extraction and knowledge mining method based on UIE and improved Apriori Algorithm
刘国平,李欣,刘东海,周诗杰,吴红艳
摘要(Abstract):
大型工程施工过程中产生了海量的安全隐患排查记录,蕴含了多类隐患要素关联知识,对工程安全管控有重要参考意义。然而,通过人工手段进行隐患危险源信息抽取与其内在关联挖掘耗时费力,难以及时反馈现场管控。提出一种基于通用信息抽取(Universal Information Extraction, UIE)框架与改进Apriori算法的隐患危险源实体智能抽取与知识挖掘方法。首先,基于UIE框架构建危险源实体识别模型,确定实体抽取提示标签,并通过小样本微调实现高效、准确的危险源实体自动抽取;然后,提出考虑隐患数据类型约束改进Apriori算法流程,进行多要素关联规则的挖掘与可视化。实例分析表明,所提出的危险源实体抽取模型在验证集与测试集上的F1值分别达到了0.892和0.886,显著高于基础模型的0.253与0.307,在整体模型上的危险源实体识别率提高了36.66%;此外,利用桑基图和关联网络图对改进Apriori抽取的多要素强关联规则进行可视化,展示出良好的可解释性。可为大型工程的海量安全隐患文本知识挖掘提供了高效、智能化的技术手段,为施工现场针对性安全管控措施制定提供了数据支持。
关键词(KeyWords): 大型工程;安全隐患;通用信息抽取;知识挖掘;自然语言处理
基金项目(Foundation): 中国长江三峡集团有限公司企业科研项目(202103551)
作者(Author): 刘国平,李欣,刘东海,周诗杰,吴红艳
DOI: 10.13928/j.cnki.wrahe.2025.S1.016
参考文献(References):
- [1] 陈晓.基于数据挖掘的煤矿安全管理知识可视化研究[D].北京:中国矿业大学(北京),2017.
- [2] HOSSAIN A,SUN X,THAPA R,et al.Applying association rules mining to investigate pedestrian fatal and injury crash patterns under different lighting conditions[J].Transportation Research Record,2022,2676(6):659-672.
- [3] XU X,ZOU P X W.Discovery of new safety knowledge from mining large injury dataset in construction[J].Safety Science,2021,144:105-111.
- [4] 谭章禄,陈孝慈.基于链路预测的安全隐患管理研究[J].中国安全生产科学技术,2020,16(9):18-23.
- [5] XU N,ZHANG B,GU T,et al.Expanding domain knowledge elements for metro construction safety risk management using a co-occurrence-based pathfinding approach[J].Buildings,2022,12(10):1-15.
- [6] 蔡近近,宋瑞,何世伟,等.基于改进FP-Growth算法和贝叶斯的营业线施工安全风险分析[J].铁道科学与工程学报,2024,21(8):3370-3381.
- [7] FU L,WANG X,ZHAO H,et al.Interactions among safety risks in metro deep foundation pit projects:An association rule mining-based modeling framework[J].Reliability Engineering & System Safety,2022,221:108381.
- [8] TAO F,PI Y,ZHANG M,et al.Hidden danger association mining for water conservancy projects Based on task scenario-driven[J].Water,2023,15:2814.
- [9] 陈碧云,丁晋,陈绍南.基于关联规则挖掘的电力生产安全事故事件关键诱因筛选[J].电力自动化设备,2018,38(4):68-74.
- [10] 张明媛,朱密,赵雪峰.任务驱动下的建筑施工现场危险源关联规则挖掘[J].安全与环境学报,2019,19(1):14-20.
- [11] TIAN D,LI M,SHI J,et al.On-site text classification and knowledge mining for large-scale projects construction by integrated intelligent approach[J].Advanced Engineering Informatics,2021,49:101355.
- [12] CHENG M Y,KUSOEMO D,GOSNO R A.Text mining-based construction site accident classification using hybrid supervised machine learning[J].Automation in Construction,2020,118:103265.
- [13] 钟雪妍,钟波涛,沈罗昕,等.基于NLP技术的建筑工程质量隐患信息抽取[J].土木工程与管理学报,2023,40(5):113-120.
- [14] LEE J,TOUTANOVA K.Pre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv,2018,3(8):1810.04805.
- [15] FLORIDI L,CHIRIATTI M.GPT-3:Its nature,scope,limits,and consequences[J].Minds and Machines,2020,30:681-694.
- [16] 刘婷,张社荣,王超,等.水利施工事故文本智能分析的BERT-BiLSTM混合模型[J].水力发电学报,2022,41(7):1-12.
- [17] 王仁超,张毅伟,毛三军.水电工程施工安全隐患文本智能分类与知识挖掘[J].水力发电学报,2022,41(11):96-106.
- [18] TIAN D,LI M,HAN S,et al.A novel and intelligent safety-hazard classification method with syntactic and semantic features for large-scale construction projects[J].Journal of Construction Engineering and Management,2022,148(10):4022109.
- [19] 杨飘,董文永.基于BERT嵌入的中文命名实体识别方法[J].计算机工程,2020,46(4):40-45.
- [20] 田丹,沈扬,李明超,等.混凝土坝施工文档实体知识智能挖掘方法[J].水力发电学报,2021,40(6):139-151.
- [21] 杨燕,叶枫,许栋,等.融合大语言模型和提示学习的数字孪生水利知识图谱构建[J].计算机应用,2024,45(3):785-793.
- [22] 杨阳蕊,朱亚萍,刘雪梅,等.水利工程文本中抢险实体和关系的智能分析与提取[J].水利学报,2023,54(7):818-828.
- [23] LU Y,LIU Q,DAI D,et al.Unified structure generation for universal information extraction[EB/OL].(2022-03-23)[2024-10-14].https://arxiv.org/abs/2203.12277.
- [24] 刘浏,王东波.命名实体识别研究综述[J].情报学报,2018,37(3):329-340.
- [25] ZHENG H,HE J,LIU Q,et al.Multi-objective optimization based fuzzy association rule mining method[J].World Wide Web,2022,26(3):1-18.
- [26] BARALIS E,CAGLIERO L,CERQUITELLI T,et al.Generalized association rule mining with constraints[J].Information Sciences,2012,194:68-84.