Big data: analysis and now casting of urban air pollution
The big idea: Awareness of air pollutants for urban populations
Challenge type: Mini (4 weeks)
Challenge owner: Local tourism info centre
Facilitator: Evaldas Vaiciukynas
Facilitator Contact data: Evaldas.Vaiciukynas@ktu.lt
Challenge is related to the topic of Big Data
Context & relevance: in general relating to SDG / specific to a field of vocational application
World Health Organization has worked to ensure that health-relevant indicators of household and ambient pollution exposure and disease burden are included in the formal system of SDG indicators.
SDG targets of relevance to ambient and household air pollution include:
SDG 3 – a substantial reduction in deaths and illnesses from air pollution;
SDG 11 – to reduce the environmental impact of cities by improving air quality.
Report on SDG 11 contains data on annual air pollution in cities. Nonetheless, more granular data (for example, daily statistics) would be useful for urban citizens for the purpose of monitoring local situation as well as planning activities or travel trips.
Variants of the challenge
World Air Quality Index project publishes a new dedicated data-set “Worldwide COVID-19 dataset”, updated 3 times a day, and covering about 380 major cities in the world, from January 2022 until now. Data contains daily statistics on the main air pollutant species (so2, pm25, pm10, co, o3) as well as meteorological factors (temperature, humidity, pressure, wind-speed, dew).
Variant 1: Finding dangerous pollution patterns. Analytics dashboard, visually summarizing historical tendencies of air pollution at a selected city. Solution should help to answer which months of the year and which days of the week have the largest levels of pollutants
Variant 2: Forecasting pollution a week ahead. Analytics dashboard, allowing to get forecast of air pollution at a selected city. Solution should allow to train a multivariate time-series model (neural network, ARIMAX, Prophet) on historical data. Since pollution levels are also dependent on meteorological factors, incorporating forecasts of these factors are important.
Business partner in an industry or a research field
Local tourism info centres, travel agencies, various environmental organizations.
Prerequisites of the learners
Software: Data Analysis Software
Working Space: Research labs
Report on SDG11 contains data on annual air pollution in cities. Nonetheless, more granular data (for example, daily statistics) would be useful for urban citizens for the purpose of monitoring local situation as well as planning activities or travel trips.
Open Issues and questions
Data download could take several minutes, so a solution should take into account the trade-off between having the most recent data and trying to download it too frequently. Some caching, last download time stamp and background asynchronous download could be considered to account for this problem. Similar problems with respect to slow runtime could arise when using time-series forecasting methods in a Variant 2 of the challenge.
Another question is the scope of the solution to the challenge – should it be done for a specific country/city only or all countries/cities available in the data? Should the solution use only composite Air Quality Index or its constituents too?