基于多源数据机器学习的区域水质预测方法研究Multi-source data machine learning-based study on method for regional water quality prediction
李雪清,郑航,刘悦忆,万文华
摘要(Abstract):
随着社会经济快速发展和水资源系统复杂性的日益增强,我国水环境质量的演变逐渐呈现跨区域、多因素耦合影响的特点。围绕大空间范围的水质预测问题,针对传统水质预测方法中对水文、气象及社会经济多因素考虑的不足,以广东省31个水质监测站在2008年到2016年间每周的水质等级数据为训练样本,选取降雨、蒸发和气温等气象指标以及GDP、总人口数、人口密度等社会经济指标为预测参数,运用支持向量机、决策树以及人工神经网络等机器学习技术,建立区域水质等级的预测模型。结果表明,机器学习方法可融合气象和社会经济等多源的、不同时空尺度的数据,对水质等级进行预测。其中,基于随机森林的预测模型表现性能最佳,预测准确率达到77.11%;基于支持向量机的预测模型次之,预测准确率达到74.99%。与现有的水质预测方法相比,该方法的计算速度快、不需要提取数据的统计特征、操作简单、能够分析社会经济因素对水质的影响,更容易在水环境治理中使用。
关键词(KeyWords): 区域水质预测;气象指标;社会经济因素;多源数据机器学习;水质;水环境;人工神经网络;机器学习技术
基金项目(Foundation): 国家自然科学基金项目“基于水能耦合的长距离调水工程优化调度理论与应用”(51909035);国家自然科学基金项目“长江水科学研究联合基金”项目“长江流域生态补偿研究”(U2040206)
作者(Author): 李雪清,郑航,刘悦忆,万文华
DOI: 10.13928/j.cnki.wrahe.2021.11.015
参考文献(References):
- [1] 张永勇,夏军,陈军锋,等.基于 SWAT 模型的闸坝水量水质优化调度模式研究[J].水力发电学报,2010,29(5):159-164.ZHANG Yongyong,XIA Jun,CHEN Junfeng,et al.Research on optimal dispatching mode of water quantity and quality of sluices and dams based on SWAT model[J].Journal of Hydroelectric Engineering,2010,29(5):159-164.
- [2] 刘悦忆,赵建世,黄跃飞,等.基于蒙特卡洛模拟的水质概率预报模型[J].水利学报,2015,46(1):51-57.LIU Yueyi,ZHAO Jianshi,HUANG Yuefei,et al.Water quality probability prediction model based on Monte Carlo simulation [J].Journal of Hydraulic Engineering,2015,46(1):51-57.
- [3] 刘东君,邹志红.最优加权组合预测法在水质预测中的应用研究[J].环境科学学报,2012,32(12):3128-3132.LIU Dongjun,ZOU Zhihong.Application of optimal weighted combination forecasting method in water quality forecasting [J].Acta Scientiae Circumstantiae,2012,32(12):3128-3132.
- [4] 颜剑波,阮晓红,孙瀚.多元回归分析在黄河水质预测中的应用[J].人民黄河,2010,32(3):35-36.YAN Jianbo,RUAN Xiaohong,SUN Han.Application of multiple regression analysis in the water quality prediction of the Yellow River[J].People′s Yellow River,2010,32(3):35-36.
- [5] 张子安,齐雨藻,林宗振.应用系统聚类分析的方法评价珠江流域北江水系的水质状况[J].生态学报,1987(1):1-11.ZHANG Zian,QI Yuzao,LIN Zongzhen.Application of systematic cluster analysis method to evaluate the water quality of Beijiang water system in the Pearl River Basin[J].Acta Ecologica Sinica,1987(1):1-11.
- [6] 史复有,孙谦,李昌迪.黄河兰州段耗氧有机污染物浓度统计预测模型的建立[J].环境科学,1989(3):72-74.SHI Fuyou,SUN Qian,LI Changdi.Establishment of a statistical prediction model for the concentration of oxygen-consuming organic pollutants in the Lanzhou section of the Yellow River[J].Environmental Science,1989(3):72-74.
- [7] 王晓萍,孙继洋,金鑫.基于BP神经网络的钱塘江水质指标的预测[J].浙江大学学报(工学版),2007(2):361-364.WANG Xiaoping,SUN Jiyang,JIN Xin.Forecast of Qiantang River water quality index based on BP neural network[J].Journal of Zhejiang University (Engineering Science Edition),2007(2):361-364.
- [8] 刘双印,徐龙琴,李道亮,等.基于时间相似数据的支持向量机水质溶解氧在线预测[J].农业工程学报,2014,30(3):155-162.LIU Shuangyin,XU Longqin,LI Daoliang,et al.Support vector machine on-line prediction of dissolved oxygen in water quality based on time-similar data[J].Transactions of the Chinese Society of Agricultural Engineering,2014,30(3):155-162.
- [9] 过仲阳,陈中原,李绿芊,等.人工神经网络技术在水质动态预测中的应用[J].华东师范大学学报(自然科版),2001(1):84-89.GUO Zhongyang,CHEN Zhongyuan,LI Lvqian,et al.Application of artificial neural network technology in dynamic prediction of water quality[J].Journal of East China Normal University (Natural Science Edition),2001(1):84-89.
- [10] NAJAH A,EL-SHAFIE A,KARIM O A,et al.Application of artificial neural networks for water quality prediction[J].Neural Computing & Applications,2013(22):187-201.
- [11] CHEN Kangyang,CHEN Hexia,ZHOU Chuanlong,et al.Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data[J].Water Research,2020,171:115454.
- [12] KABACOFF R I.R语言实战[M].王小宁,刘撷芯,黄俊文,等,译.北京:人民邮电出版社,2016.KABACOFF R I.R language combat [M].Translated by WANG Xiaoning,LIU Xiexin,HUANG Junwen,et al.Beijing:People′s Posts and Telecommunications Press,2016.
- [13] 周志华.机器学习[M].北京:清华大学出版社,2016.ZHOU Zhihua.Machine learning [M].Beijing:Tsinghua University Press,2016.
- [14] 周志华.集成学习[M].李楠译.北京:电子工业出版社,2020.ZHOU Zhihua.Integrated learning [M].Translated by LI Nan.Beijing:Electronic Industry Press,2020.
- [15] 吕红燕,冯倩.随机森林算法研究综述[J].河北省科学院学报,2019,36(3):37-41.LYU Hongyan,FENG Qian.Summarization of random forest algorithm research[J].Journal of Hebei Academy of Sciences,2019,36(3):37-41.
- [16] ALBUQUERQUE L G,ROQUE F D,VALENTE-NETO F,et al.Large-scale prediction of tropical stream water quality using Rough Sets Theory [J].Ecological Informatics,2021,61:101226.
- [17] VENKATESWARLU T,ANMALA J,DHARWA M,et al.PCA,CCA,and ANN modeling of climate and land-use effects on stream water quality of karst watershed in upper Green River,Kentucky [J].Journal of Hydrologic Engineering,2020,25(6):05020008.
- [18] GUO D,LINTERN A,WEBB J A,et al.Key factors affecting temporal variability in stream water quality [J].Water Resources Research,2019,55:112-129.
- [19] SUN R H,WANG Z M,CHEN L D,et al.Assessment of surface water quality at large watershed scale:land-use,anthropogenic,and administrative impacts [J].Journal of the American Water Resources Association,2013,49(4):741-752.
- [20] SHI B Q,BACH P M,LINTERN A,et al.Understanding spatiotemporal variability of in-stream water quality in urban environments - A case study of Melbourne,Australia [J].Journal of Environmental Management,2019,246:203-213.
- [21] 国家环境保护总局科技标准司.地表水环境质量标准:GB 3838—2002 [S].北京:中国标准出版社,2002.Department of Science and Technology Standards,State Environmental Protection Administration.Surface water environmental quality standards:GB 3838—2002 [S].Beijing:China Standards Press,2002.
- [22] HO J Y,AFAN H A,EL-SHAFIE A H,et al.Towards a time and cost effective approach to water quality index class prediction[J].Journal of Hydrology,2019,575:148-165.