You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.

# Indoor air quality prediction systems for smart environments: A systematic review

Air quality is a critical matter of concern in terms of the impact on public health and well-being. Although the consequences of poor air quality are more severe in developing countries, they also have a critical impact in developed countries. Healthcare costs due to air pollution reach $150 billion in the USA, whereas particulate matter causes 412,000 premature deaths in Europe, every year. According to the Environmental Protection Agency (EPA), indoor air pollutant levels can be up to 100 times higher in comparison to outdoor air quality. Indoor air quality (IAQ) is in the top five environmental risks to global health and well-being. The research community explored the scope of artificial intelligence (AI) in the past years to deal with this problem. The IAQ prediction systems contribute to smart environments where advanced sensing technologies can create healthy living conditions for building occupants. This paper reviews the applications and potential of AI for the prediction of IAQ to enhance building environment and public health. The results show that most of the studies analyzed incorporate neural networks-based models and the preferred evaluation metrics are RMSE, R2 score and error rate. Furthermore, 66.6% of the studies include CO2 sensors for IAQ assessment. Temperature and humidity parameters are also included in 90.47% and 85.71% of the proposed methods, respectively. This study also presents some limitations of the current research activities associated with the evaluation of the impact of different pollutants based on different geographical conditions and living environments. Moreover, the use of reliable and calibrated sensor networks for real-time data collection is also a significant challenge. ## 1.Introduction Air quality not only has a material responsibility in human exposure to pollutants but is also crucial for specific groups such as older adults and people with disabilities [121]. Numerous research studies state the adverse health effects associated with poor air quality levels, such as premature death, respiratory, cardiovascular disease along with a relevant increase in asthma attacks, dementia, and cancer [112,128,130]. Poor air quality concentration levels are responsible for 3.2 million deaths worldwide [128,130]. The consequences of poor air quality are most severe in developing countries where there is no regulation to control pollutant emissions. However, air quality levels are also a problem in developed countries. Every year in the USA, approximately 60,000 premature deaths are reported and linked to reduced air quality levels. Moreover, the healthcare costs related to air quality diseases reach$150 billion [87]. According to the European Environment Agency, air pollution was responsible for 400,000 premature deaths in the European Union (EU) in 2016. The particulate matter caused 412,000 premature deaths in 41 European countries, and 374,000 occurred in the EU [36]. Moreover, the cost related to the air pollutant emissions caused by industrial facilities in the EU was estimated at around €59 to 189 billion in 2012 [51]. The Environmental Protection Agency (EPA) stated that indoor pollutant levels could be up to 100 times higher when compared with outdoor air quality. Therefore, indoor air quality (IAQ) is ranked as one of the top five environmental risks to global health and well-being [107]. IAQ is a matter of potential concern for the building occupants [26]. As people spend most of their time indoors, poor air quality leaves a significant impact on overall public health [8,14,17]. In particular, older adults and people with disabilities, who are the most venerable groups, commonly spend all of their time inside buildings [82]. Living environments include numerous types of spaces and locations, such as workplaces, clinics, public service centers, faculties, leisure spaces, vehicles, cabins, and outdoor locations [29]. Notably, a significant percentage of indoor environments have a high number of occupants [79]. Even in locations with good air quality, short-term exposure to pollutant levels can cause potential health symptoms to sensitive groups such as elderly and children; especially those suffering from asthma and cardiovascular problems [62,126].

World Health Organization (WHO) has developed numerous reports on IAQ [19,88,127,130]. These reports state that almost three billion of the most impoverished population in the world rely on solid fuels (crop wastes, charcoal, animal dung, wood and coal) for their everyday cooking and heating needs [129]. These solid fuels produce a considerable level of harmful gases and increase particulate matter concentration levels [72]. Repeated exposure to these pollutants can hamper the health quality of an individual. The impact of indoor air pollution (IAP) is equally high in the urban buildings as well due to excessive use of chemical-rich cleaning agents, oil-based pains, fragrant decorations, and other toxic consumer products and building elements [47,72]. Unfortunately, household air pollution caused more than 4.3 million premature deaths in 2012, mostly in middle and low-income countries [18,44,54,58,65,74]. The stats show 6% deaths due to lung cancer, 12% due to pneumonia, 22% because of chronic obstructive pulmonary disease (COPD), ischemic heart disease accounts for 26% and stroke for 34% deaths annually [129].

Ventilation arrangements considerably influence the quality of indoor air [59,67,76,93]. Numerous countries have set up new regulations for achieving adequate ventilation and IAQ in the buildings [7,3537]. However, the starting point should be the source control and reduction of pollutants in the indoor air [1,7,10,15,52,80,94]. Several studies available in the state-of-art reveal a considerable change from open fireplaces in the residential areas to sealed modern fireplaces [28,34,117]. The new buildings are equipped with wireless communication technologies and sensors. Therefore, it becomes easier to monitor the environmental factors on a real-time basis [71].

Governments and environmental agencies have also designed new public policies to reduce pollutant exposure to the building occupants [27,41,45,118]. Although it is a reasonable response towards IAQ management, monitoring pollutant levels in the building environment on a real-time basis can be a significant step towards efficient source control and management. Furthermore, the latest technologies, such as artificial intelligence (AI) and machine learning (ML), can utilize Big Data related to pollutant levels for forecasting future conditions in the living environment [3,6,73,105]. Several researchers are also exploring the potential of the Internet of Things (IoT) for developing smart environments that could address major challenges related to IAQ, building energy efficiency and occupant comfort [39,43]. The concept of smart homes, smart factories, smart cities and smart health systems are gaining immense popularity around the world. Moreover, smart environments are mainly influenced by the combination of AI and IoT [43,110,123]. On the one hand, traditional threshold-triggered solutions can provide instant updates about critical IAQ levels. On the other hand, AI-based prediction systems can deliver prior information about upcoming critical changes in IAQ levels. Hence, building occupants can take preventive majors to avoid serious health impacts [5,125]. The research communities from the past years are exploring the potential of AI to design intelligent environments where building occupants get automatic, real-time updates about changing environmental conditions [24,95,135]. This vision has taken them to the concept of ambient intelligence that further contribute to the development of smart environments for healthy living [30,110,111]. Several researchers in the past have proposed efficient prediction systems for IAQ to improve public health and well-being [70,109,115,139,140]. These studies can enhance the daily activity level while providing better scope for ventilation arrangements. Also, these applications can assist in the development of favorable ambient assisted living systems and improved productivity levels at office premises.

In sum, IAQ leaves a considerable impact on public health and well-being. Therefore, it is a critical matter of concern for both developed and developing countries [20]. This paper reviews the application of AI for the prediction of IAQ to enhance the building environment and public health. The main objective of this work is to study the potential of AI methods for developing smart IAQ systems to enhance building environments. To achieve this, an in-depth analysis is performed on IAQ prediction systems considering features used for evaluation of pollutant levels in the indoor environment, accuracy rate of the existing systems, and prediction interval for which IAQ condition is predicted by the system. The scope of this systematic review consists of an analysis of the AI-based forecasting approach proposed by researchers from different countries with unique demographic conditions such as domestic environment, IAP variables and socioeconomic status [40,99,103].

This review will help to find answers for potential research questions while highlighting the new problem domain in which future researchers need to put effort. Moreover, this systematic review provides a detailed comparison of existing AI-based IAQ prediction systems for smart environments. It highlights the potential of the specific techniques, along with the impact of several feature extraction methods. In addition, this paper aims to summarize the findings achieved by previous studies regarding the accuracy and methods used.

The rest of this review article is structured as follows: Section 2 provides the methodology with research questions, inclusion and exclusion criteria, search strategy, study selection and risk of bias. Section 3 includes the results and discussions along with answers to the RQs. Finally, Section 4 presents the conclusion.

## 2.Materials and methods

This systematic review is conducted using the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analysis) methodology. This is a technique of evidence-based reporting with a minimum set of items for meta-analysis and systematic reviews. In order to address the challenges associated to IAQ prediction, the process was divided into several steps. In the first step, the relevant research questions were identified and then a search strategy developed following specific search keywords and strings. After this, the inclusion and exclusion criteria were defined to ease the selection of most relevant papers from the existing database. Next, data extraction was carried out based on the pre-defined research questions. Furthermore, the answers to these questions were given while highlighting the challenges, opportunities and limitations in the field. These steps are defined in the subsequent sections below.

### 2.1.Research questions

The rising number of health problems due to poor building environments is a matter of concern for government agencies and policymakers as well. It is essential to address the challenges by utilizing the latest technologies, and AI shows potential in this direction. However, future research needs to examine the critical aspects to design a more reliable solution for IAQ management. The authors in this systematic review identified essential research questions and tried to find relevant answers through this detailed study. Therefore, the research questions for this systematic review are:

• (RQ1) What are the system architectures used for IAQ data collection and how is it collected?

• (RQ2) What are the features or input parameters used to process the IAQ data for prediction system design?

• (RQ3) What are the widely used AI methods for IAQ prediction?

• (RQ4) What are the accuracies and prediction times of these methods?

• (RQ5) What application domains are addressed by existing publications?

• (RQ6) How can these systems be integrated into smart building systems?

• (RQ7) How are the results of IAQ systems presented to the end-users?

These questions have been established by the authors to achieve the main contribution of this paper, that is to present a systematic review of AI methods used for IAQ prediction. Moreover, this paper aims to provide a comprehensive review of the main features used for the prediction, the accuracies achieved, the data collection techniques, the period of prediction, and state the future research challenges and opportunities.

### 2.2.Search strategy

To address the research questions, the authors have used three different databases: PubMed, IEEE and ACM. The research for relevant publications was initiated on 27th March 2020, and a filter to select studies after the year 2008 was applied. The initial search query used has the following combination of keywords: “indoor air quality AND (prediction OR forecasting)”.

In total, 235 documents were identified, out of which 159 were obtained from PubMed, 41 from IEEE and 35 from ACM database. These studies were further processed as per the inclusion and exclusion criteria.

### 2.3.Inclusion and exclusion criteria

All the authors independently evaluated all papers, which were selected for analysis by the cumulative agreement of all parties. The documents were analyzed to address the different methods related to the implementation of AI methods for IAQ prediction. The selection of the papers for inclusion in this review was made if the research satisfied the following eligibility criteria.

Inclusion criteria: (1) Research studies that include IAQ prediction based on methods related to AI sub-domains; (2) The information about the data used and their origin must be present in the document; (3) The paper must concern an analysis of indoor living environments; (4) The study must present at least one prediction metric; (5) The information of the indoor parameters monitored, or the instruments used must be presented in the document; (6) The research paper must be written in English and published after 2008.

Exclusion criteria: (1) Duplicate papers; (2) Publications that are secondary studies, such as reviews, study paper or demo papers; (3) Papers that do not provide clear insights about the prediction system and performance parameters; (4) Papers that are relevant to outdoor environments only.

### 2.4.Study selection

All publications obtained after applying the initial search query were analyzed as per the PRISMA guidelines. First of all, the documents were analyzed for presence of any duplicate studies and at this stage, two papers were rejected. The remaining 233 papers were transferred for a second level screening. The relevance of papers was then identified by considering the title, and abstract and 193 papers were excluded because they did not meet the specified inclusion and exclusion criteria. Most of the papers were literature reviews of the environmental science field, studies about the IAQ exposure and their effects on people’s health, studies on building-related problems, IoT and WSN architectures for IAQ supervision, research on HVAC systems and sensors, computation fluid dynamics IAQ models, IAQ prediction methods using theoretical and mathematical approaches, and were not related to artificial intelligence methods.

After applying the above-mentioned eligibility criteria, the authors obtained 40 papers for the third stage, which were studied in detail. In this list, two papers only focus on outdoor air quality [64,97], eight papers do not include any AI-specific prediction algorithms [22,31,47,50,84,91,116,122] or were based on some mathematical approaches. Three papers [12,96,137] only focused on thermal comfort (temperature and/or humidity data) or other smart building aspects instead of air quality. Moreover, two studies [89,120] were rejected because they were limited to a monitoring system design and no prediction system was implemented. One paper [9] had a relevant abstract, but the authors did not specify the prediction methods. Similarly, authors in [108] did not specify the AI method used for prediction. Therefore, these studies did not meet the first criteria for inclusion and were thus excluded. Besides this, the document presented in [124] was excluded from the study. The authors used a fuzzy control method, but the evaluation parameters such as prediction accuracy or period were not defined. Therefore, this study does not fulfil the fourth inclusion criteria. The researchers in [23] applied advanced fuzzy control theory to control IAQ and reduce energy consumption. They worked on the prediction of the Air Quality Index, and the parameters considered to control indoor environment conditions were indoor temperature, air quality and humidity. The fuzzy prediction system helped to create an energy balance between IAQ, and at the same time, energy consumption was optimized. The authors implemented real-time monitoring and concluded that the system is correct and feasible. However, this study does not provide the details in terms of prediction accuracy and period. Therefore, it did not fulfil the inclusion criteria no. 4 and was not included in the meta-analysis. Finally, this systematic review analyzed 21 papers that were relevant to address the research questions identified at the beginning of this section. Clear insights concerning the selection process as per the PRISMA guidelines are presented in Fig. 1.

##### Fig. 1.

PRISMA flow diagram for studies included in this systematic review.

### 2.5.Extraction of study characteristics

The relevant data was extracted from the selected publications for further analysis. In order to conduct this systematic review, the following information was extracted:

• Author details, titles and abstracts.

• Year of publication and associated database.

• Focused geographical area and application.

• Pollutant type, sensors used, and calibration status of sensors.

• AI method used for IAQ prediction.

After data extraction, the included publications were synthesized and analyzed in detail to obtain answers for the pre-defined research questions.

### 2.6.Risk of bias

The main limitation of conducting a systematic review is that it is influenced by bias. The first risk of bias arises during the selection of initial keywords/string to initiate a search on database. Moreover, the subjectivity of eligibility criteria defined by authors increases bias at the screening stage. Furthermore, the search only included three databases (PubMed, IEEE and ACM). However, based on the PRISMA guidelines, the authors tried to follow the best possible criteria and procedures for completing this systematic review. Although early researchers have conducted several reviews of AI-based IAQ prediction systems [11,55,90,101], they did not consider all these relevant RQs, especially RQ2, RQ3 and RQ4. The information provided for all these relevant factors make this review a valuable addition to the scientific community.

## 3.Results and discussion

This systematic review includes 21 studies on AI-based IAQ prediction systems from three different databases out of which eight studies (38.09%) were included from PubMed, seven (33.3%) from IEEE and six (28.57%) from ACM (Table 1).

##### Table 1

Year-wise distribution of papers from the three databases

 Year of publication PubMed IEEE ACM Number of studies 2008 [113] 1 2009 [66,133] 2 2012 [114,136] 2 2013 [132] 1 2014 [24,25] 2 2015 [21] 1 2016 [134] [119] [38] 3 2017 [2,4] 2 2018 [75] [78] 2 2019 [77] [33] [106,131] 4 2020 [49] 1

Table 2 summarizes the studies based on the origin of the selected papers. It can be seen that out of the included publications, six studies (28.57%) were conducted in China, four (19%) in the USA and three (14.28%) in Korea. Finally, one study each was included from other countries as mentioned in Table 2. However, no study from other developing countries such as India, Nepal and Bangladesh that are greatly affected by IAP due to inadequate ventilation arrangements were included [13,42,48,53,83,100]. As the majority of the population in these countries use biomass fuels for cooking and heating purpose, the researchers in these locations need to show an active participation in the development of some potential IAQ monitoring and prediction systems that can provide more accurate results based on specific geographic conditions and pollutant concentrations [61,85,138].

##### Table 2

Country-wise distribution of included publications

 Country Reference number Number of studies China [24,25,75,114,131,133] 6 USA [38,113,132,134] 4 Korea [4,66,77] 3 South Africa [2] 1 Ireland [21] 1 Czech Republic [119] 1 Australia [49] 1 Taiwan [136] 1 Egypt [33] 1 Switzerland [78] 1 Denmark [106] 1

Furthermore, the synthesis process that focused on extracting relevant information from the included publications is presented in Tables 3, 4 and 5. Table 3 lists technical insights of the system designed by previous researchers, Table 4 presents details about the focus IAQ parameters, and Table 5 provides an analysis of extracted features and performance parameters of the existing systems.

##### Table 3

List of papers included in the study

 Reference number Application area IAQ parameters Thermal comfort Sensors used Calibrated data input Integrated to smart building AI methods used Result display method [134] Commercial building NH3 Temp, RH Innova 1412 multigas model; T-type thermocouples; RH transmitter Model HX92BC; Infrared motion sensors Yes Yes ANFIS N/A [21] Office building NO2, PM2.5 Temp, RH, EPAM 5000 Haz-Dust Monitor, Teledyne M200 Monitor Yes No ANN N/A [113] Commercial building NH3, SO2, H2S, CO2, PM10, Temp, RH Mobile Emission Laboratory Gas Sampling System Yes No RBFNN N/A [4] Office building PM2.5, CO2, VOCs, Temp, RH, light quantity SH-300-DS, PMS3003, SHT11, GL5537, MICS-VZ-89 Yes No Gated recurrent unit Web [2] Residential building PM2.5 Not used Dylos air quality monitor DC1100 Pro and NOVA PM Sensor SDS011 Yes Yes MLP NN Web [75] Residential building CO2, PM2.5 and PM10 Temp, RH, air velocity TSI 8520, TSI 7515, TSI 8392A Yes No ANN N/A [77] Waiting rooms and underground platforms PM2.5, PM10, CO2, NO2, CO, NO Temp, RH Telemonitoring system Yes No Deep RNN N/A [132] Office building CO2, VOC Temp, RH Custom-built measurement equipment using ELT S-100 CO2 sensor & TGS2602 VOC Yes No Bayesian inference N/A [119] Residential building CO2 Temp, RH Siemens QPA2062 and QAC22 sensors Yes Yes Decision tree regression method N/A [66] Indoor spaces at subway station CO2, CO, NOx, NO, NO2, PM2.5, PM10 Temp, RH - - No RNN N/A [114] Office building Toluene, NO2, CO, benzene, CH2O Temp, RH GSBT11, O2-A1, TGS2620, TGS2201, and TGS2602, Yes No GA-based least squares SVM regression N/A [133] Office building CO2, PM2.5, VOC, Airborne bacteria, fungi Temp, RH, air velocity - - No MLP NN N/A [49] Office building H2, NH3, ethanol, H2S, toluene, CO, CO2, O2 Temp, RH Waspmote sensors from Libellium Yes Yes Extended fractional-order Kalman filter N/A [136] Office building CO2 Temp, RH IEEE1451.4 standard-based wireless sensing equipment No ARIMA Web [33] Office building CO2 Temp, RH KNX modules Yes Yes Gated recurrent unit N/A [24] Commercial building PM2.5 Temp, RH, pressure, wind speed Dylos DC1700 No Yes ANN-based purification time interface method Web
##### Table 3

(Continued)

 Reference number Application area IAQ parameters Thermal comfort Sensors used Calibrated data input Integrated to smart building AI methods used Result display method [25] Office building PM2.5 Temp, RH, pressure, wind speed Shinyei PPD42NJ, Thermo Yes No Back propagation NN Smartphone App [38] Residential building PM2.5, VOC Temp, RH Dylos DC1700, Applied Sensor IAQ Engine, SHT15 No No Machine learning-based non-parametric forecasting Smartphone App [78] Office and residential building O3, CO2, VOC Temp MISC-OZ-47 O3 Sensor, CC811 VOC Sensor Yes No Multiple linear regression, non-linear ANN N/A [106] Office building CO2 - - No Yes Time slicer method, PAD method Smartphone App [131] Residential building PM2.5, PM10, CO2, tVOC, Formaldehyde Temp, RH - No No ARIMA Smartphone App

The first research question concerned the types of system architectures used and the methods of data collection. Consequently, the studies can be divided into four parts: 1) Studies that were based on real-time monitoring systems designed by the researchers, 2) Commercial monitoring solutions, 3) Studies that used data obtained from already installed or government-operated systems and 4) Mobile stations or wearable sensors. The results are summarized in Table 6.

Most of the reviewed studies use data acquisition systems either developed by the authors or commercially available ones for data collection. In total, 57.14% (N=12) of the analyzed studies used real-time collection systems designed and developed by the authors for data collection. These systems are based on IoT or WSN architectures and incorporate low-cost sensors for data acquisitions. Moreover, they include popular open-source microcontrollers such as Raspberry Pi and Arduino as processing units. Furthermore, five studies (33%) used commercial monitoring systems that are typically portable and powered using batteries. These systems provide a built-in display to allow visualization of the collected data in real-time or offer extraction methods for further data analysis. Three papers [21,24,25] used data acquired using previously installed environmental quality supervision systems data. One study [113] used a mobile air quality data station for data collection and another study used wearable sensors for sensing IAQ parameters. The studies proposed by the authors of [66,133] do not specify the methods used for data collection. In conclusion, the analysis reveals that most of the researchers preferred installing self-designed sensor networks for monitoring IAQ.

To ensure accuracy in real-time data collection, either researchers used expensive, highly calibrated sensor units or low-cost sensors with specific calibration arrangements. There are several air quality pollutants that affect indoor environment. However, distinct researchers have focused on different set of pollutants to predict the future conditions. An analysis of main parameters for data collection is provided in Table 4.

##### Table 4

Different IAQ pollutants measured by researchers in different studies

 Reference no. PM2.5 PM10 SO2 CO2 CO NO2 NH3 NO NHx F. Tol. H2 H2S Eth. VOC Ben. Air. Fun. T. R.H. [134] X X X [21] X X X X [113] X X X X X X X [4] X X X X X [2] X [75] X X X X X [77] X X X X X X X X [132] X X X X [119] X X X [66] X X X X X X X X X [114] X X X X X X X [133] X X X X X X X [49] X X X X X X X X X [136] X X X [33] X X X [24] X X X [25] X X X [38] X X X X [78] X X X [106] X [131] X X X X X X X Total 11 5 1 14 4 4 3 2 1 2 2 1 2 1 6 1 1 1 19 18

F.: Formaldehyde; Tol: Toluene; Eth.: Ethanol; Ben.: Benzene; Air.: Airborne bacteria; Fun.: Fungi; T.: Temperature; R.H.: Relative Humidity.

##### Table 5

Essential details extracted from all papers

 Authors Methods of dataset collection Features extracted/input parameters Period of prediction Accuracy measures Compared performance Main outcomes [134] Sensor network Summertime: Pit NH3 concentration (PNH3), pit temperature (PT), pig activities (ACT), pit fan-E speed (PFE), pit fan-W speed (PFW), Room fan 14”(F14) and Room fan 20”(F20)Wintertime: Pit ammonia concentration (PNH3), Pit temperature (PT), Room humidity (RH), Pit humidity (PH), Pig activities (ACT), Pit fan-E speed (PFE) and Pit fan-W speed (PFW) - Summertime: MSE=0.002; MAPE=31.599; SD=0.0564; R2=0.6351Wintertime: MSE=0.0047; MAPE=23.6816; SD=0.0802; R2=0.6483 Backpropagation neural network, multiple linear regression model Suitable for input parameters having complex, highly fluctuating and non-linear relationship [21] Sensor network for IAQ parameters, national meteorological monitoring stations for weather data Time of day, barometer level pressure (hPa), sea level pressure (hPa), temperature (°C), relative humidity (%), wind speed (knots), wind direction (knots), Pasquill atmospheric stability class, global solar radiation (j·cm−2) and outdoor pollutant concentrations. - NO2Building 1: R2=0.854; Std. Error = 3.15Building 2: R2=0.870; Std. Error = 4.66Building 3: R2=0.829; Std. Error = 3.91PM2.5Building 1: R2=0.711; Std. Error = 2.17Building 2: R2=0.760; Std. Error = 2.06Building 3: R2=0.770; Std. Error = 1.85 - Stronger predictive abilities for indoor NO2 concentration when compared to PM2.5 using outdoor concentrations of meteorological variables. [113] Sensor network Outdoor temperature and RH; static pressure difference between the inside and outside of the swine building; barn inventory and average mass per pig; building fan revolutions per minute (RPM); indoor, inlet, and exhaust temperatures; and inside RH were considered as preliminary model input variables. - NH3Concentration: R=0.9119; MAE=2.712; RMSE=3.489Emission: R=0.879; MAE=0.713; RMSE=0.928H2SConcentration: R=0.809; MAE=68.597; RMSE=85.929Emission: R=0.925; MAE=0.060; RMSE=0.085CO2textitConcentration: R=0.995; MAE=123.692; RMSE=173.60Emission: R=0.926; MAE=98.879; RMSE=133.586PM10Concentration: R=0.741; MAE=123.692; RMSE=173.60Emission: R=0.810; MAE=0.049; RMSE=0.072 - PCA and statistical modelling promises higher prediction performance
##### Table 5

(Continued)

 Authors Methods of dataset collection Features extracted/input parameters Period of prediction Accuracy measures Compared performance Main outcomes [4] Arduino-based sensor network CO2, PM, temperature, humidity, light, VOC - Prediction accuracy = 84.69% LSTM and linear regression The proposed algorithm determines optimal time-step size automatically for deep learning models [2] Raspberry Pi-based sensor network Timestamps, mean for sliding window sequence, class value for mean, class label for the target class 30 min, 1 hour 30 minAccuracy = 0.864; Precision = 0.855; Sensitivity = 0.855; Specificity = 0.871; F-Measure = 0.8551 hourAccuracy = 0.788; Precision = 0.780; Sensitivity = 0.754; Specificity = 0.817; F-Measure = 0.767 Bayesian network, decision table,J48, random forest Network performance tested for variable sliding window length [75] Tester models from TSI Co Ltd. Max, min, range, average, std deviation - Prediction accuracy = 83.33% Support vector machine The proposed method could dramatically reduce the measurement time from days to seconds, avoiding unnecessary costs, time consumption, and labors [77] Telemonitoring system Statistical features 6 h, 12 h, 18 h, 24 h RMSE=21.04μg/m3, MAPE=32.92%, R2=0.65 LSTM, SRNN Provide point-by-point prediction and multiple sequence prediction [132] Hybrid sensor network Standard deviation of VOC concentration and CO2 levels sampled at 0.2 Hz - Simulations indicate that our hybrid sensor network architecture on average is 23.9% more accurate than the mobile-only architecture and 35.8% more accurate than the stationary-only architecture. Bayesian inference The proposed framework is composed of an optimal indoor concentration prediction, an error estimation model, and a hybrid sensor network synthesis algorithm [119] BACnet (Building Automation and Control network) technology-based sensor network Date, time, internal RH, external and internal temperature For randomly selected 200 values RMSE=46.25 ppm Performance compared with variable parameter settings Presents close relation of temperature and humidity values with CO2 prediction [66] - Case 1NO, NO2, NOX, CO, CO2, temperature, humidity, PM10, and PM2.5Case 2PM10, PM2.5 and temperature - Case 1PM10: RMSE=29.37PM2.5: RMSE=18.38Case 2PM10: RMSE=28.57PM2.5: RMSE=17.80 Regression and NN Study reveals that several input variables have a bad impact on the prediction model; hence, only sensitive parameters must be considered for design.
##### Table 5

(Continued)

 Authors Methods of dataset collection Features extracted/input parameters Period of prediction Accuracy measures Compared performance Main outcomes [114] E-nose system - - FormaldehydeMAREP=8.04%, R=0.9987, σ2=0.0081BenzeneMAREP=4.33%, R=0.9961, σ2=0.0019TolueneMAREP=3.46%, R=0.9960, σ2=0.0010COMAREP=7.23%, R=0.9961, σ2=0.0065NO2MAREP=5.44%, R=0.9998, σ2=0.0097 GA-BPNN The adaptive genetic algorithm was used for optimizing biases and weights of LSSVM and BPNN [133] Occupant symptom metric Mean, median, std deviation, min, max, distribution and parameters - R2=0.69, RMSE=8.8 Multiple linear regression analysis, backpropagation NN ANOVA test was performed to check significance level of input variables [49] Libellium sensor motes - - MAPE, R2, RMSE Extended Kalman filter The factional order version of the extended Kalman filter can deal with missing/inaccurate and highly non-linear data [136] Wireless sensor network CO2 levels from ten, twenty, thirty, forty, and fifty minutes in the past, were used as inputs for prediction - Maximum error rate = 7.18%, minimum error rate = 0.06% ARIMA An integrated solution to collect and analyse IAQ using sensor nodes and ARIMA prediction models. [33] Wireless sensor network installed in a domestic house (SML system house) Outdoor and indoor temperature and humidity, CO2, indoor and outdoor light, rain and wind velocity 24 hours Day 1MAE=2.562, RMSE=4.05102Day 2MAE=2.4289, RMSE=4.0438 LSTM network Utilized MIMO method for forecasting h-step ahead multivariate time series, it helps to handle dependencies between future values while avoiding error accumulation [24] Sensor network and meteorological website Outdoor IAQ, indoor IAQ, temp, hum, wind speed, pressure - Accuracy = 1.00; Purification time = 2 hours Linear regression, simple ANN Meteorological features improved system accuracy [25] Sensor network and meteorological website GPS coordinates, location-related humidity, temperature, point of interest - AccuracyRaw data = 0.504; Signal reconstruction data = 0.603; ANN calibrated data = 0.641; GP inference data = 0.81 Raw data, signal reconstructed data, ANN calibrated data, GP inference data Detailed calibration analysis
##### Table 5

(Continued)

 Authors Methods of dataset collection Features extracted/input parameters Period of prediction Accuracy measures Compared performance Main outcomes [38] Sensor network PM2.5 increase rate; VOC increase rate, PM2.5 increase magnitude; VOC increase magnitude; PM2.5 decrease rate; VOC decrease rate; PM2.5 decrease magnitude, VOC decrease magnitude; PM2.5 Std; VOC Std; Hum Std; Hum change magnitude; Cross sensor change magnitude ratio; cross sensor Std ratio - In terms of IAQ forecast, the average NRMSD (normalized root mean square deviation) when starting prediction at two minutes after the peak value is 7.3% for Family 3, 7.9% for Family 4, and 7.5% for Family 5. Average source identification accuracy of 87.0%, 90.7% and 92.2% across all pollution events at three families respectively. Not compared Analysis carried out on three homes [78] Wristband/sensor network O3, VOC, T - O3 predictionRMSE=7.4 ppb; R2=1.5CO2 predictionRMSE=81 ppb; R2=0.88 - Discussed calibration in detail [106] Sensor network CO2 count, occupancy count using PIR sensors - Time slicer method: RMSE=31.6863; Information loss = 20.2203; PAD Method: RMSE=34.0622; Information loss = 21.5357 Time slicer vs. PAD method - [131] Sensor network - - Akaike Information Criterion (AIC) value of ARIMA (0,2,1) model obtained from previous step is 312, while the AIC value of ARIMA (1,0,1) obtained automatically by the method called Auto-ARIMA in R language is 337.37. So, ARIMA (0,2,1) is better. Not compared -
##### Table 6

Data collection methods

 Domain Studies Number of studies Real time monitoring systems designed by researchers [2,4,24,25,38,49,78,108,114,131,132,136] 12 Commercial monitoring solutions [2,21,75,119,134] 5 Data obtained from already installed systems [33,77,106] 3 Mobile stations or wearable sensors [78,113] 2

This analysis reveals that 66.6% (N=14) of the studies include CO2. This is considered as the most relevant IAQ measurement parameter. Thermal comfort parameters play an essential role in IAQ measurement. Therefore, 90.47% (N=19) of the studies focused on temperature, whereas 85.71% (N=18) of the studies considered relative humidity. When CO2 levels increase in the indoor environment, the oxygen concentration level decrease. Consequently, this can cause potential harm to the life of human beings living inside. The real-time monitoring and prediction systems can provide instant alerts about a possible rise in CO2 so that occupants can follow relevant ventilation measures ahead of time.

Besides this, particulate matters (PM2.5 and PM10) are critical since they have a direct connection to our respiratory health [36]. The PM2.5 sensors are used in 52.3% (N=11) of the analyzed studies, and PM10 sensors are included in 23.8% (N=5) of the papers. Most of these studies are carried out in urban areas. However, in rural homes with inadequate ventilation arrangements and biomass fuel as the primary source for cooking, the situation can be even more dangerous [129]. Future researchers need to follow a proactive approach to monitor IAQ conditions in rural areas and design some cost-effective and reliable alert/forecasting systems to prevent the decayed quality of life. The types of sensors used for monitoring different parameters are already mentioned in Table 3. Future research should focus on the utilization of these existing hardware modules or in the design and development of more accurate and calibrated monitoring systems to enhance air quality monitoring. It is critical to analyze the potential of the latest technologies, such as edge computing or accurate state-of-the-art sensors for the future to develop effective and efficient IAQ monitoring systems [81,104].

The second research question concerned features or input parameters used for designing a prediction system. Feature extraction and input parameter selection play an essential role in designing an AI-based prediction system. The performance of the prediction model is highly dependent on the type of features used for network training. The list of features used in included 21 papers is presented in Table 5 (column 3).

In total, four studies [24,66,119,133] presented an analysis of the sensitivity of selected features. In order to ensure higher accuracy for forecasting system, it is essential to ensure that network is trained with most relevant features because irrelevant or least relevant features can deviate network performance [16,69,86,92,102].

Seven studies used measured input parameters as training parameters [4,21,66,113,119,134] and five studies considered statistical analysis of features to ensure that most relevant features are fed to the network [2,75,77,132,133]. One study [66] provided a clear analysis of the relevance of features and how their inclusion or exclusion affect the performance of the IAQ prediction system. The researchers in this paper executed different cases with unique input parameter selection and visualized network performance for those changes. The analysis shows that bad or irrelevant parameters cause a worse impact on the prediction system performance.

The third research question focuses on the used AI methods for IAQ prediction systems. As can be seen in Table 3, most of the researchers worked on different versions of neural networks. In total, the researchers of 10 studies (47.61%) used neural network-based methods. Two studies [131,136] followed ARIMA and two other studies used the GRU method for IAQ prediction. Besides this, one study each focused on ANFIS [134], Kalman filter [49], GA-based SVM [114], time slicer method [106], Bayesian inference [132] and decision tree regression [119]. However, none of these studies included fuzzy logic, which otherwise offers potentials scope for forecasting problems [56,57]. Future researchers should focus on the application of fuzzy logic and other relevant machine learning methods for forecasting IAQ conditions [32]. LSTM is another crucial solution, and several researchers considered this technique (see Table 5) for comparing performance of their proposed methods and to validate the quality of results. It is also possible to create hybrid forecasting techniques by combining these available methods or by utilizing the potential of optimization techniques such as PSO, GA and simulated annealing [63,98].

In terms of the accuracies and prediction time of the existing models, essential details are mentioned in Table 4. Researchers focused on common parameters to evaluate the performance of the prediction system, which are listed in Table 7.

As can be seen, 42.85% (N=9) and 28.57% (N=6) of the analyzed studies use RMSE and R2 metrics, respectively, to evaluate the performance of their models. In total, seven studies (33.3%) used the error rate for performance analysis. Other metrics, such as variance (σ2), std error, SD, MSE, and MREP, are used in one study. Moreover, the authors of [132] only provide a comparison between their methods and do not quantitatively describe metrics for their performance. Out of all these studies, four papers [2,33,77,119] provided the detailed performance analysis of the predicted hours. Note that authors in [2] used the classification method to identify good and bad air quality. The performance of the prediction model was evaluated in terms of classification parameters, including accuracy, precision, sensitivity, selectivity. IAQ is a sensitive issue as it is closely related to human health and well-being. However, prediction systems should not just focus on error parameters. Instead, a significant IAQ prediction model should also provide a forecast for the coming hours [33]. These systems can help occupants to take adequate decisions about ventilation and follow preventive measures to avoid serious health complications. By considering the advantage of [2,33,77,119] for prediction hour analysis to alert the occupants, researchers also need to make efforts to develop a cost-effective, reliable and accurate future alert-based IAQ prediction system.

##### Table 7

Evaluation parameters used in different studies

 Accuracy measure Studies RMSE [33,49,66,77,78,106,113,119,133] MAPE [49,77,134] R2 [21,49,77,78,133,134] MAE [33,113] MAREP [114] MSE [134] SD [134] Std error [21] Variance [114] R [113,114] Prediction accuracy [4,24,25,75] Error rate [33,49,66,77,113,119,133]

This RQ focuses on the application domains that are addressed by existing publications. As can be seen from Table 3, 11 (52.03%) out of 21 studies were executed on IAQ data collected from an office building which were either an institute-labs, staff rooms or traffic prone workspaces. The data collected in six studies (28.57%) [2,4,38,78,119,131] is related to residential buildings. However, three studies [24,113,134] focused on other commercial buildings such as gym, shopping malls and two studies were conducted at indoor spaces such as waiting rooms of subway stations. IAQ has been a considerable challenge for people who spend most of their routine time indoors. A considerable number of health issues among employees in offices and industrial units are reported due to unfavourable environmental conditions. The excessive use of chemical-rich cleaning agents and fragrance solutions put more threat to the overall health and well-being of employees [46,71]. Furthermore, the risks are more significant in remote areas where people use traditional sources such as wood, coal and kerosene for cooking and heating purpose [60]. Women, children and elderly members of such poor families are at a higher risk since they spend 80–90% of their routine time indoors [103]. The main concern while designing IAQ monitoring and prediction systems is that the ultimate product must be cost-effective, easy to use and simple to install at rural as well as urban areas [68]. Besides this, future researchers need to address the issues related to battery consumption, type of sensors, a communication mechanism and system architecture [104]. An adequate combination of hardware and software is a must to achieve real-time IAQ monitoring and prediction goals. At the same time, policymakers need to raise awareness about the use of real-time monitoring systems so that most of the people consider installing them.

RQ6 concerns the opportunities for integrating IAQ prediction systems with smart building systems. In total, seven (33.3%) out of 21 studies [2,24,33,49,106,119,134] were based on smart building solutions where the IAQ prediction system was integrated with other smart solutions in the premises for improved lifestyle and well-being. However, the remaining 14 studies (66.6%) were independent solutions where researchers worked solely on IAQ monitoring and prediction. Ventilation is one of the main concerns in modern as well as traditional houses. The new age IAQ prediction systems must be integrated into automated ventilation management so that adequate arrangements for the circulation of fresh air can be made on time. The prediction systems can provide updates about future conditions of IAQ levels, and the smart building management can be adjusted accordingly to prevent serious health consequences for the building occupants. As can be seen in Table 5, four studies [2,33,77,119] provided information about the number of predicted hours using their proposed method. One study [33] claimed prediction for the next 24 hours. However, the authors of [77] proposed prediction for 6, 12, 18 and 24 hours. Furthermore, the authors of [2] provided a prediction efficiency of 30 minutes and one hour only. The number of predicted hours is crucial for real-time systems as it can help occupants make prior arrangements in terms of expected critical changes in the pollutant concentrations [30]. This information could be essential for disabled patients and those suffering from chronic diseases such as respiratory health problems or cardiovascular disease.

Finally, RQ7 focuses on the methods that are used by early researchers to present IAQ system results to the end-users. The field of research is not restricted to the design and development of the IAQ prediction system. Moreover, future researchers need to be careful about how the predictions or ultimate results of monitoring systems are presented to the end-users. As can be seen in Table 3, four studies (19.04%) presented the results of the prediction system on a web-based solution. Alternatively, four other studies (19.04%) preferred designing a smartphone application. The details about the end-user interface were missing from the remaining studies. The overall effectiveness of the IAQ prediction system depends on how the results are accessible to end-users. The design should not be limited to smartphone applications and web-based platforms. It is equally relevant to provide alerts for predicted critical situations so that building occupants can take immediate actions for ventilation arrangements [30]. The triggers must be further connected to the smart building management systems to control all mechanisms accordingly.

## 4.Conclusion

This study conducted a systematic literature review on IAQ prediction systems based on AI methods. The review was performed by studying and analyzing academic papers published in PubMed, IEEE and ACM databases. The most relevant articles were analyzed as per the pre-defined RQs and eligibility criteria, which helped to highlight the potential of AI to address IAQ-related problems.

The trend for IAQ monitoring has become a dominant concept in most developing countries where a significant part of the population is dependent on traditional cooking, heating measures and use of inadequate ventilation arrangements [103]. Furthermore, the forecasting of IAQ conditions ahead of time has become an essential concern for improved public health and well-being for enhanced ambient intelligence and smart environments. In this study, 47.61% of the reviewed papers (N=10) have used neural network-based methods for this purpose. Nine (42.85%) and seven studies (33.3%) out of the analyzed literature used RMSE and error rate metrics to evaluate the performance of their models. Furthermore, the features used to train the models have a critical role in the overall performance of the system.

The researchers have shown interest in measuring a variety of IAQ pollutants, 66.6% (N=14) of the studies include CO2. Moreover, temperature and humidity parameters were included in 90.47% and 85.71% of the studies, respectively. The data collection process is performed using real-time collection systems based on IoT and WSN architectures designed and developed by the authors in 57.14% of the analyzed literature. These systems are developed using open-source technologies such as Arduino and Raspberry Pi platforms. Furthermore, 33% of the analyzed papers include the use of commercial monitoring systems for data acquisition purposes.

The analyzed literature presents the potential of deep learning, machine learning and neural networks for enhanced living environments and occupational health in the smart environments. Nevertheless, this literature review has limitations. For this study, only papers in English from PubMed, IEEE and ACM were considered. This study may help to outline crucial possibilities in the field of IAQ and public health management. At the same time, this literature review states multiple challenges regarding the current state-of-the-art for smart environments. Future research also needs to evaluate the impact of different pollutants based on different geographical conditions and variable living arrangements. Another critical area of work is the development of the most adequate and highly calibrated sensor networks to measure IAQ levels on a real-time basis. Furthermore, future research needs to ensure that developed systems are useful on a real-time basis for rural areas, where people might not be able to afford more expensive solutions.

None to report.