The intelligent fusion of multi-modal data through artificial intelligence offers transformative solutions to pressing societal challenges, particularly in the domain of environmental sustainability. Air pollution, primarily driven by anthropogenic activities, remains a critical public health threat, with pollutants such as SO2, O3, PM10, NO2, and CO contributing to respiratory and cardiovascular diseases and premature mortality. Accurate assessment of air quality is therefore vital for smart choices and stronger cities. However, the complexity of heterogeneous data sources, including environmental imagery and sensor-based gas concentration measurements, poses significant challenges to model accuracy and robustness, especially when datasets are incomplete or noisy. Addressing these problems, the study proposes a two-phase approach that leverages advanced multi-modal data fusion techniques for robust air quality prediction. In the first phase, environmental images are used to predict air quality using a CapsuleNet-based deep learning model, attaining an accuracy level of 98.22%, along with precision, recall, and F1-score of 97%. In the second phase, structured numerical and categorical sensor data containing missing values are processed using KNN-based imputation, followed by classification through CapsuleNet. This phase attains an impressive accuracy of 99.98%, precision of 99.86%, recall of 99.35%, and F1-score of 99.61%, significantly outperforming conventional machine learning and deep learning models. The proposed system exemplifies intelligent multi-modal data integration by seamlessly combining visual and sensor modalities to enhance air quality monitoring. Furthermore, it underscores the potential of self-organizing, adaptive systems in supporting sustainable urban development and public health interventions. Empirical comparison with benchmark methods highlights the validity and reliability of the proposed solution.
Intelligent multi-modal data implementation with capsuleNet for accurate air quality index prediction
Nappi, Michele
2025
Abstract
The intelligent fusion of multi-modal data through artificial intelligence offers transformative solutions to pressing societal challenges, particularly in the domain of environmental sustainability. Air pollution, primarily driven by anthropogenic activities, remains a critical public health threat, with pollutants such as SO2, O3, PM10, NO2, and CO contributing to respiratory and cardiovascular diseases and premature mortality. Accurate assessment of air quality is therefore vital for smart choices and stronger cities. However, the complexity of heterogeneous data sources, including environmental imagery and sensor-based gas concentration measurements, poses significant challenges to model accuracy and robustness, especially when datasets are incomplete or noisy. Addressing these problems, the study proposes a two-phase approach that leverages advanced multi-modal data fusion techniques for robust air quality prediction. In the first phase, environmental images are used to predict air quality using a CapsuleNet-based deep learning model, attaining an accuracy level of 98.22%, along with precision, recall, and F1-score of 97%. In the second phase, structured numerical and categorical sensor data containing missing values are processed using KNN-based imputation, followed by classification through CapsuleNet. This phase attains an impressive accuracy of 99.98%, precision of 99.86%, recall of 99.35%, and F1-score of 99.61%, significantly outperforming conventional machine learning and deep learning models. The proposed system exemplifies intelligent multi-modal data integration by seamlessly combining visual and sensor modalities to enhance air quality monitoring. Furthermore, it underscores the potential of self-organizing, adaptive systems in supporting sustainable urban development and public health interventions. Empirical comparison with benchmark methods highlights the validity and reliability of the proposed solution.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


