Please use this identifier to cite or link to this item:
https://digital.lib.ueh.edu.vn/handle/UEH/70254
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Rong-Li Ga | - |
dc.contributor.other | Hao Zhang | - |
dc.contributor.other | Dang Ngoc Hoang Thanh | - |
dc.date.accessioned | 2023-11-29T08:44:51Z | - |
dc.date.available | 2023-11-29T08:44:51Z | - |
dc.date.issued | 2023 | - |
dc.identifier.issn | 1678-4324 | - |
dc.identifier.uri | https://digital.lib.ueh.edu.vn/handle/UEH/70254 | - |
dc.description.abstract | A HA_Cart_AdaBoost model is proposed to clean the data in drinking-water-quality data. First, the data that do not follow the normal distribution are regarded as outliers and eliminated. Next, the optimal control theory of nonlinear partial differential equations (PDEs) is introduced into the cart decision tree, and the cart decision with the specified depth is used. As a weak classifier of AdaBoost, the tree uses the HA_Cart_AdaBoost model to compensate for the eliminated data, then it fits and predicts the missing values of the data stream, realizes the cleaning of drinking-water-quality data, and finally uses the big data Hadoop architecture for real-time storage and analysing streaming data. The experimental results show that compared with the most advanced data cleaning methods, after the optimal control theory of nonlinear PDEs is introduced into the cart decision tree, the stability and accuracy of the HA_Cart_AdaBoost model for water quality data cleaning are greatly improved. Taking pH as an example, the HA_Cart_AdaBoost model shows a minimum improvement of 2.25% and a maximum improvement of 53.33% in terms of RMSE, and a minimum improvement of 13.51% and a maximum improvement of 78.08% in terms of MAE. | en |
dc.format | Portable Document Format (PDF) | - |
dc.language.iso | eng | - |
dc.publisher | SciELO | - |
dc.relation.ispartof | BRAZILIAN ARCHIVES OF BIOLOGY AND TECHNOLOGY | - |
dc.relation.ispartofseries | Vol. 66 | - |
dc.rights | SciELO | - |
dc.subject | CART decision tree | en |
dc.subject | Partial differential equation | en |
dc.subject | Water data cleaning | en |
dc.subject | AdaBoost algorithm | en |
dc.subject | Hadoop architecture | en |
dc.title | A Big Data Cleaning Method for Drinking-Water Streaming Data | en |
dc.type | Journal Article | en |
dc.identifier.doi | https://doi.org/10.1590/1678-4324-2023220365 | - |
ueh.JournalRanking | ISI, Scopus | - |
item.fulltext | Only abstracts | - |
item.openairetype | Journal Article | - |
item.grantfulltext | none | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.languageiso639-1 | en | - |
item.cerifentitytype | Publications | - |
Appears in Collections: | INTERNATIONAL PUBLICATIONS |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.