| 193 | 0 | 561 |
| 下载次数 | 被引频次 | 阅读次数 |
施工方案是施工组织设计的核心环节。运用自然语言处理技术,从非结构化文本中提取施工方案,并予以审阅和审查,可有效提高施工方案的审查效率,提高编制质量,同时有利于发现施工方案中潜在的安全和质量风险,以便施工过程加以预警。从非结构化文本提取施工方案,需要明确不同类型的施工方案的内容构成,对相关段落进行内容归类。针对非结构化施工方案段落内容归类问题,在深入研究工程施工方案类别及内容构成框架的基础上,以城市管网工程施工组织设计段落为样本,进行了施工组织设计段落内容分类,提出了融合Albert、TextRCNN的段落文本分类模型,该模型采用Albert预训练语言模型进行词嵌入,将生成的词向量输入到TextRCNN分类器中完成文本分类,准确率提高0.79%,试验表明:结合Albert的TextRCNN可以有效对施工组织设计段落进行内容分类,为进一步施工方案提取提供基础。
Abstract:Construction scheme is one of the most important contents of construction organization design. The application of natural language processing to extract and review construction schemes from unstructured texts can improve review efficiency, enhance preparation quality, and identify potential safety and quality risks for early construction warnings. To extract construction schemes from unstructured texts, it is necessary to first clarify the content composition of different types of construction schemes and classify the contents of relevant paragraphs. Aiming at the classification of unstructured construction scheme paragraphs, classification of construction organization design paragraphs of urban pipe network engineering was taken as samples on the basis of in-depth study of construction scheme categories and content composition framework, and proposes a paragraph text classification model integrating Albert and TextRCNN. This model uses Albert pretraining language model for word embedding, and input the generated word vector into TextRCNN classifier to complete text classification. It outperforms Albert-TextCNN, which has the best classification effect among the other three models, and the accuracy increases by 0.79%. The experiment shows that: TextRCNN combined with Albert can effectively classify the contents of the construction organization design paragraphs, providing a basis for further construction scheme extraction.
[1] LESTER A.Project Management Plan[M].Oxford:Butterworth-Heinemann,2021.
[2] 中华人民共和国住房和城乡建设部.建造施工组织设计规范:GB/T 50502—2022[S].北京:中国建造工业出版社,2022.
[3] 叶辉,卓奕荣,曹东,等.基于深度学习的中文病历病史智能分类研究[J].中国数字医学,2019,14(3):41-43.
[4] 方峰新.医疗文档识别系统的设计与实现[D].武汉:华中科技大学,2021.
[5] LEE J,YI J S,SON J.Development of automatic-extraction model of poisonous clauses in international construction contracts using rule-based NLP[J].Journal of Computing in Civil Engineering,2019,33(3):04019003.
[6] HABIMANA O,LI Y,LI R,et al.Sentiment analysis using deep learning approaches:An overview[J].Science China Information Sciences,2020,63:1-36.
[7] 冯帅,许童羽,周云成,等.基于深度卷积神经网络的水稻知识文本分类方法[J].农业机械学报,2021,52(3):257-264.
[8] WU C,LI X,GUO Y,et al.Natural language processing for smart construction:Current status and future directions[J].Automation in Construction,2022,134:104059.
[9] DING Y,MA J,LUO X.Applications of natural language processing in construction[J].Automation in Construction,2022,136:104169.
[10] HUANG X.Language independent text categorization[J].Journal of Chinese Information Process,2000,14(6):1-7.
[11] HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
[12] 宋英华,吕龙,刘丹.基于组合深度学习模型的突发事件新闻识别与分类研究[J].情报学报,2021,40(2):145-151.
[13] 万家山,吴云志.基于深度学习的文本分类方法研究综述[J].天津理工大学学报,2021,37(2):41-47.
[14] 周末.基于BGRU和自注意力机制的文本分类模型研究[D].南京:南京邮电大学,2021.
[15] 王雅婷.中文词向量的人工测试集改进及语言学评测[D].合肥:安徽大学,2019.
[16] 来斯惟.基于神经网络的词和文档语义向量表示方法研究[D].北京:中国科学院大学,2016.
[17] BENGIO Y,DUCHARME R,VINCENT P.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3:1137-1155.
[18] 徐菲菲,冯东升.文本词向量与预训练语言模型研究[J].上海电力大学学报,2020,36(4):320-328.
[19] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26.
[20] 段丹丹,唐加山,温勇,等.基于 BERT 模型的中文短文本分类算法[J].计算机工程,2021,47(1):79-86.
[21] GHANEM R,ERBAY H.Spam detection on social networks using deep contextualized word representation[J].Multimedia Tools and Applications,2023,82(3):3697-3712.
[22] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[J].Computation and Language,2018,23(2):3-19.
[23] 刘睿珩,叶霞,岳增营.面向自然语言处理任务的预训练模型综述[J].计算机应用,2021,41(5):1236-1246.
[24] 李忠,杨百一,李莹,等.ALBERT与双向GRU的多标签灾情信息预测[J].科学技术与工程,2021,21(35):15284-15289.
[25] 温超东,曾诚,任俊伟,等.结合ALBERT和双向门控循环单元的专利文本分类[J].计算机应用,2021,41(2):407-412.
[26] KIM Y.Convolutional neural networks for sentence classification[J].Computation and Language,2014:1408.5882.
[27] SCHUSTER M,PALWAL K K.Bidirectional recurrent neural networks[J].IEEE T Signal Processs,1997,45(11):2673-2681.
[28] 鲍闯.基于BERT的中文长文本分类算法研究[D].南京:南京信息工程大学,2022.
[29] 翁洋,谷松原,李静,等.面向大规模裁判文书结构化的文本分类算法[J].天津大学学报(自然科学与工程技版),2021,54(4):418-425.
基本信息:
DOI:10.13928/j.cnki.wrahe.2025.S1.015
中图分类号:TP391.1;TU721.1;TU990.3
引用信息:
[1]杜润隆,谭柯鑫,高添,等.施工方案类别及文本分类模型实现研究分类[J].水利水电技术(中英文),2025,56(S1):95-101.DOI:10.13928/j.cnki.wrahe.2025.S1.015.
基金信息:
中国长江三峡集团有限公司科研项目(202103551)
2025-03-20
2025-03-20