Making it easier to work with Sentinel 5 Precursor data


What we mean with Exploitation of space products is that problem-specific applications are developed on the basis of data produced by instruments, either on the ground or on board of satellites. At Spascia we develop problem-specific applications with the data produced by mainly two spaceborne instruments: TROPOMI on board of Sentinel 5 Precursor, and IASI on board of the MetOp satellites. In terms of applications, we use both Level 1 and Level 2 data, that are also called Data Products.

We also use other datasets as required, such as the Copernicus Atmosphere Monitoring Service – a.k.a. CAMS – however there is a nuance that must be made between model data, that is the result of mathematical simulations of atmospheric physics and chemistry based on ground and spaceborne measurements (a.k.a observations), and data that is as raw an observation can be. In this post I will focus on TROPOMI’s Level 2 data.

At Spascia, we mainly work with nitrogen dioxide (NO2), carbon monoxide (CO), and methane (CH4), yet sulfur dioxide (SO2), ozone (O3) and aerosols are getting more and more attention. That is quite a large portion of Sentinel 5 Precursor (S5P) products. Our work doesn’t care for the borders between the countries: our investigations and the necessary validations make use of the whole Earth coverage provided by S5P/TROPOMI. Also, we make use of all the data available since the beginning of S5P mission, that is – at the time of writing – over two years of data. This is no luxury when one wants to study variations over time, or perform a disaggregation (e.g. gaining a better geographical resolution than the original product).

Why “making it easier“?

TROPOMI covers the whole Earth daily, as its vehicle (S5P) orbits Earth approximately 14 times a day. The image below is an example of TROPOMI’s Level 2 output. It was produced with Panoply from a single Level 2 data file containing nitrogen dioxide concentrations along the track of one half-orbit.

Example of nitrogen dioxide concentrations along the track of one half orbit of S5P on 5 January 2021 around 3:30 pm UTC.

Each half orbit file of nitrogen dioxide concentrations weighs approximately 450 MB. A day’s worth of data for NO2 uses 6 GB. A year represents 2 TB of disk space spread over over 5,000 files. All together the 6 pollutant species we routinely use at Spascia currently represent over 40,000 files and occupy nearly 20 TB of disk space.

This sheer volume of data can neither be downloaded on the fly (and re-downloaded over and over) as we create and refine processing chains. That is why we store it on our premises.

Indeed we also manage the processor versions. For NO2, the concentrations contained in the files produced from early December 2020 (processor version 1.4.0) can not be used along with the data produced earlier (with processor 1.3.0 and earlier versions). Managing processor versions is key to working on consistent data sets. Indeed, curation is needed in order to exploit these products properly.

At Spascia, we mutualise the tasks otherwise needed from each researcher who may need to exploit the data: downloads, storage, curation are all performed continuously and for all of us. That makes sense when dealing with 20 TB datasets. And yet we’re going further…

Going further

For more than a year now, we have experimented with setting up databases that hold all the NO2 data available since the beginning of the S5P mission, over the whole Earth. This concept was leveraged to perform a study of S5P’s ability to see the effects of the COVID related lockdowns on urban pollution, that is presented in our paper: “Analysis of the NO2 tropospheric product from S5P TROPOMI for monitoring pollution at city scale”.

Since that paper, we maintain the whole NO2 dataset since November 2018 for processor 1.3.0 and below, and we routinely export “obs-cores” in CSV files, to suit the precise needs of all the people at Spascia who work with S5P L2 NO2 data. Obs-cores are extracts of geolocated, timestamped, NO2 observations centred on a given longitude, latitude coordinates for a given period of time. That database has proven helpful for creating two algorithms. The first one detects and geolocates NO2 emission sources automatically, and will be the topic of a future post. The second one characterises emission sources, i.e. it tells how much nitrogen dioxide is produced by a source. We are waiting for the reprocessed series to be available in order to build the next iteration of this database.

In the meantime we are getting the 6 species mentioned earlier in the same database, in order to access the information related to all of them for any point in space and time without parsing thousands of files.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: