The world’s leading publication for data science, AI, and ML professionals.

Creating Sentinel 2 (Truly) Cloudless Mosaics with Microsoft Planetary Computer

Mask clouds (and cloud shadows) effectively on Microsoft Planetary Computer using S2 Cloudless layers from GEE with geeS2Downloader package

Photo by NASA on Unsplash
Photo by NASA on Unsplash

Introduction

Microsoft Planetary Computing is a good competitor to the well established Google Earth Engine in the Geospatial arena. The possibility to access petabytes of satellite information without the need to download each image and the computational power provided by Dask clusters is a gamechanger for developing regional/global applications and research. Planetary Computer is still in preview (access request is required) but the possibility to use its computing HUB with open source tools like XARRAY and STAC is a good advantage (IMHO) over Google’s proprietary API.

Another advantage is that we don’t have to bother with size constraints like the ones in GEE that prevent us from accessing the values in a Numpy array format easily. Ok, I understand that the GEE’s concept is different and that is meant to force people to run the computations in the server-side (using the concept of lazy arrays), but there are times that it is just easier to access the data directly. In Planetary Computer it is even possible to download full assets directly from Microsoft’s catalog using a "home made" downloader (more on that on a future post).

On the downside, the dataset catalog is not as extensive as the one found in Google’s competitor. And there is one thing that really makes a difference when working with optical (Sentinel 2) imagery: cloud masks.

In Microsoft’s official Cloudless Mosaic tutorial (here) they state the following: "Under the assumption that clouds are transient, the composite shouldn’t contain (many) clouds, since they shouldn’t be the median pixel value at that point over many images.". That’s… ughhh… not exactly what we are expecting. There will always be the Scene Classification Layer – SCL layer in L2A S2 images, but those who have used Sen2Cor before know it is not the best in accuracy. Baetens et al. (2019) brings a comprehensive comparison in this subject.

And that brings us back to our story: How can we effectively remove clouds from Sentinel 2 images if we are running on Microsoft’s Planetary Computer. As mentioned in the Planetary Computer Examples GitHub repository (here) this is something they are working on, but until then…

The S2 Cloudless Algorithm

The S2 Cloudless package is a machine learning cloud detection algorithm developed by Sinergise and available on GitHub (here). This algorithm is being used by ESA and has been recently included in the GEE’s catalog as a cloud probability map. So, instead of running the trained algorithm for each scene, we can have access the probability maps directly from GEE. The problems are:

  • We want to use Planetary Computer and we are not on GEE environment! and,
  • What about the cloud shadows?

To solve the two aforementioned points , I’ve combined the use of the geeS2Downloader package (more about it on this story [here](https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless)) and the tutorial Sentinel-2 Cloud Masking with s2cloudless available in GEE’s page (here). The tutorial shows how to project the cloud shadows according to the solar azimuthal angle to find the actual ground shadows and the geeS2Downloader package makes it easier to download an asset from GEE overcoming its size limitations. If it seems too complicated, let’s take a look at the full solution…

The Solution

Let’s first start with an empty notebook in Microsoft Planetary’ s HUB and install the dependencies. In this case we will need two packages that are not pre-installed in PC (Planetary Computer) and one optional for quicker visualization:

  • earthengine-api: to access the assets from Google Earth Engine.
  • geeS2Downloader: To download assets from GEE.
  • Geemap: great package from professor Qiusheng Wu, that also works in Microsoft’s PC in its latest version.

Once initialized, we will search a specific tile by TILE_ID and Date. We will use a function called search_tiles for that.

<Item id=S2B_MSIL2A_20181214T133219_R081_T22KFV_20201008T100849>

Then, we need to get the corresponding Cloud Probability map in GEE. Additionally, to project the shadows we will need the full S2 image in GEE as well. For that we will create a new function called get_gee_img that will be responsible for locating the image in GEE’s catalog given the STAC Item.

Now it’s time to see the images that we’ve downloaded using Geemap:

Code output. Image by author.
Code output. Image by author.

As we can see, the clouds are correctly identified, but we still need to get rid of the shadows. For that, we will create a function create_cloud_mask inspired by the GEE’s tutorial. It will dilate the final mask with 50m buffer and rescale it to 20m resolution.

{'type': 'Image',
 'bands': [{'id': 'cloudmask',
   'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 1},
   'dimensions': [5490, 5490],
   'crs': 'EPSG:32722',
   'crs_transform': [20, 0, 600000, 0, -20, 7500040]}]}

As we can see by the code’s output, the mask seems to be created correctly with 20m, as expected. To display it on the geemap, we can just add this layer with the following command:

Map.addLayer(mask.selfMask(), {'min': 0, 'max': 1, 'palette': ['orange']}, 'Final Mask', True, 0.5)

And here we have the final mask output. Note that we have not only the clouds but also the shadows masked (in orange).

Code output. Image by author.
Code output. Image by author.

Downloading the mask to PC

Now that we have processed the mask in GEE, it’s time to download it and see if it fits correctly in our original image. For that, we will use the geeS2Downloader package. This package will slice the mask automatically according to the GEE’s limits and reconstruct the final matrix.

Code output. Image by author.
Code output. Image by author.

Displaying the results

Now, let’s display the final results to compare with the image in PC. Instead of geemap, we will use plain old Matplotlib for this task.

Code output. Image by author.
Code output. Image by author.
Zoomed in output. Image by author.
Zoomed in output. Image by author.

Conclusion

As we have seen in this story, it is possible to combine assets from Google Earth Engine and Microsoft Planetary Computer to extract the best from both platforms. While Microsoft’s PC doesn’t include a reliable cloud layer, this is a workaround I created to move on with my projects. In the future, I expect that this kind of workaround will not be necessary anymore. Until there, hope it helps some of us!

Thanks and see you in the next story!

Stay Connected

If you liked this article and want to continue reading/learning these and other stories without limits, consider becoming a Medium member. You can also check out my portfolio at https://cordmaur.carrd.co/.

Join Medium with my referral link – Maurício Cordeiro

Reference

Baetens, L., Desjardins, C., Hagolle, O., 2019. Validation of Copernicus Sentinel-2 cloud masks obtained from MAJA, Sen2Cor, and FMask processors using reference cloud masks generated with a supervised active learning procedure. Remote Sensing 11, 433. https://doi.org/10.3390/rs11040433


Related Articles