Scaling Spherical Deep Learning to High-Resolution Input Data

Scattering networks on the sphere for scalable and rotationally equivariant spherical CNNs

Published in

Towards Data Science

8 min readSep 28, 2022

Conventional spherical CNNs are not scalable to high resolution classification tasks. In this post we present spherical scattering layers — a novel spherical layer that reduces the dimensionality of the input data while retaining relevant information, while also being rotationally equivariant. Scattering networks work by employing predefined convolutional filters from wavelet analysis rather than learning convolutional filters from scratch. As the weights of scattering layers are designed rather than learned, scattering layers can be used as a one-time preprocessing step that reduces the resolution of the input data. We demonstrate empirically that spherical CNNs equipped with an initial scattering layer can scale to tens of megapixel resolutions, a feat that was previously intractable with conventional spherical CNN layers.

This blog post was co-authored with Augustine Mavor-Parker.

Previous Spherical Deep Learning Approaches are Computationally Demanding

Spherical CNNs [1, 2, 3] are extremely useful for a variety of problems in machine learning as many data sources cannot naturally be represented on a flat plane (see our previous article for an introduction). A key property of spherical CNNs is that they are equivariant to rotations of spherical data (we focus on rotationally equivariant approaches in this article). In practice, this means spherical CNNs have impressive generalisation properties that allow them to do things like classify 3D object meshes regardless of how they are rotated (and whether they have seen different rotations of the meshes during training).

We recently described a series of advances developed at Kagenova to improve the computational efficiency of spherical CNNs. Our approach — efficient generalised spherical CNNs — preserves the equivariance properties of previous spherical CNNs while being considerably more computationally efficient [1]. However, despite these advances in computional efficiency, spherical CNNs are still limited to relatively low-resolution data — meaning spherical CNNs cannot be applied to exciting applications that typically involve higher resolution data, such as cosmological data analysis and 360° computer vision for virtual reality. In a recent article we introduce spherical scattering layers networks to scale efficient generalised spherical CNNs higher resolutions [4], which we review in the current post.

Hybrid Approaches to Support High-Resolution Input Data

In developing efficient generalised spherical CNNs [1], we found that a hybrid approach to building spherical CNNs architectures was very effective. Hybrid spherical CNNs use different flavors of spherical CNN layers in the same network, allowing a practitioner to get the benefits of different types of layers at different stages of processing.

Diagram of example hybrid spherical CNN architecture. Note how the layers are not monolithic but instead are different flavors of spherical CNN layers. [Diagram created by authors.]

Scattering networks on the sphere continue with this hybrid approach and introduce a new kind of spherical CNN layer that can be plugged into existing spherical architectures. To scale efficient generalised spherical CNNs to higher dimensionalities, this new layer needs to be:

Computationally scalable
Mix information to low frequencies to allow subsequent layers to operate at low-resolution
Rotationally equivariant
Provide a stable and locally invariant representation (i.e. provide an effective representational space)

We identified scattering network layers as having the potential to satisfy all of these properties.

Scattering Networks on the Sphere

Scattering networks, first proposed in the Euclidean setting by Mallat [5], can be thought of as CNNs with fixed convolutional filters derived from wavelet analysis. Scattering networks have proven to be very useful for conventional (Euclidean) computer vision — especially in cases where data is limited and therefore learning convolutional filters is difficult. Here we briefly discuss the inner workings of scattering network layers, how they satisfy the requirements defined in the previous section, and how they can be developed for spherical data analysis.

Data processing within a scattering layer is performed by three basic operations. The first building block is a fixed wavelet convolution, which is similar to a normal learned convolution used in Euclidean CNNs. After the wavelet convolution, scattering networks apply a modulus non-linearity to the resulting representation. Lastly, scattering make use of a scaling function, which performs a form of local averaging and has some similaries to pooling layers in vanilla CNNs. Repeated application of these three building blocks scatters input data down a computational tree, with the resulting representations (analogous to CNN channels) being pulled out of the tree at different stages of processing. A schematic diagram of these operations is shown below.

Spherical scattering network of the spherical signal f. The signal is propagated through cascades of spherical wavelet transforms, combined with absolute value activation functions, denoted by red nodes. The outputs of the scattering network are given by projecting these signals onto the spherical wavelet scaling function, resulting in scattering coefficients denoted by blue nodes. [Diagram created by authors.]

The operations of a scattering network may seem slightly obscure from a traditional deep learning point of view. However, each of the computational operations described have a a specific purpose — designed to exploit solid theoretical results from wavelet analysis.

The wavelet convolutions in scattering networks have been carefully derived to extract relevant information from the input data. For example, in the case of natural images, wavelets are defined that specialise in extracting information related to edges at high frequencies and the general shapes of objects at lower frequencies. As a result, in the planar setting scattering network filters can have some similarity to traditional CNN filters. The same can apply in the spherical setting, where we use scale-discretised wavelets (see [4] for details).

As the wavelet filters are fixed, the initial scattering layers only need to be applied once, rather than repeatedly throughout training (like the initial layers in a traditional CNN). This makes scattering networks computationally scalable, satisfying requirement #1 above. Furthermore, scattering layers reduce the dimensionality of their input data, meaning only a limited amount of storage is required to cache scattering representations while training downstream CNN layers.

The modulus non-linearity is applied after wavelet convolutions. Firstly, this injects non-linearity into the network. Secondly, the modulus mixes high frequency information in the input signal to low frequencies, satisfying requirement #2 above. This is shown in the figure below, which show the frequency distribution of wavelet representations of data before and after the modulus non-linearity.

The distribution of wavelet coefficients at different spherical frequencies l before and after a modulus operation. The energy in the input signal is moved from high frequencies (left panel) to low frequencies (right panel). f is the input signal and Ψ is a wavelet at scale j. [Diagram created by authors.]

After the application of the modulus, the resulting signal is projected onto the scaling function. Scaling functions pick out low frequency information from the representation, similar to the operation of a pooling function in a traditional CNN.

We tested empirically the theoretical equivariance properties of spherical scattering networks. The test was done by rotating signals and feeding them through our scattering network and then comparing the resulting representations to those where input data is put through the scattering network and then rotated. In the table below we demonstrate that the equivariance error for a given depth is low, thus satisfying requirement #3 (typically in practice one does not go beyond a depth of two since most of the signal energy is already captured).

Rotational equivariance error of spherical scattering networks at a variety of depths.

Lastly, it has been proved theoretically that Euclidean scattering networks are stable to small diffeomorphisms or distortions [5]. This result has been extended to scattering networks on compact Riemannian manifolds [6] and also specifically the sphere [4]. Stability to diffeomorphisms in practice means that the representation computed by a scattering network will not be dramatically different if a slight change to the input is made (see our previous post for a discussion of the role of stability in geometric deep learning). Consequently, scattering networks provide a well-behaved representational space on which subseuqent learning can proceed effectively, satisfying requirement #4 above.

Scalable and Rotationally Equivariant Spherical CNNs

Given that the scattering layers introduced satisfy all of our desired properties, we are now ready to integrate them into our hybrid spherical CNNs. As previously alluded, scattering layers can be bolted onto existing architectures as an initial preprocessing step to reduce the size of the representations that the following spherical layers process.

The scattering layer module (left of the dotted line) is a **designed** layer meaning it does not have to be trained, whereas the rest of the layers (right of the dotted line) are trainable. This means the scattering layer can be applied as a one-time preprocessing step to reduce the dimensionality of the input data. [Diagram created by authors.]

As the scattering networks have fixed representations for a given input, the scattering network layer can be applied to the whole data set once at the beginning of training, with the resulting low dimensional representations being cached for training the subsequent layers. Luckily the scattering representations are of a reduced dimensionality, meaning the disk space requirements for storing them are relatively low. Given this new spherical scattering layer, efficient generalised spherical CNNs are ready to be scaled to high-resolution classification problems.

Classifying the Anisotropies of the Cosmic Microwave Background

How is matter distributed throughout the universe? This is a fundamental research question for cosmologists that has major implications for theoretical models of our Universe’s genesis and evolution. Cosmic microwave background (CMB) radiation — remnant energy from the big bang — charts the distribution of matter throughout the universe. Cosmologists observe the CMB on the celestial sphere, which calls for computational methods that can perform cosmological analysis natively on the sphere.

Cosmologists are interested in methods for analysing the CMB that are capable of detecting non-Gaussiantiy in the distribution of the CMB throughout space, which can have important implications for theories of the very early Universe. Such analysis methods also need to be able to scale to literally astronomical resolutions. We demonstrated our scattering-based networks are able to meet these requirements by classifying CMB simulations as Gaussian or non-Gaussian at a resolution of L=1024. Scattering-based networks successfully classify these simulations at an accuracy of 95.3% — a much better result than the 53.1% of a lower-resolution conventional spherical CNN.

Example high resolution simulations of the CMB from Gaussian and non-Gaussian classes used for evaluating spherical scattering network’s ability to scale to high resolutions. [Images created by authors.]

Summary

Spherical scattering layers compress the dimensionality of their input representations while retaining important information for downstream tasks. We have demonstrated this makes scattering layers extremely useful for spherical classification tasks at high resolutions. This opens up a plethora of potential applications that were previously intractable such as cosmological data analysis and classification of high resolution 360 images/videos. However, many computer vision problems require dense predictions — such as segmentation or depth estimation — necessitating high dimensional outputs as well as high dimensional inputs. Developing tractable spherical CNN layers that can increase the dimensionality of their output representations, while also preserving equivariance, is the subject of current research at Kagenova that will be presented in an upcoming post.

References

[1] Cobb, Wallis, Mavor-Parker, Marignier, Price, d’Avezac, McEwen, Efficient Generalised Spherical CNNs, ICLR (2021), arXiv:2010.11661

[2] Cohen, Geiger, Koehler, Welling, Spherical CNNs, ICLR (2018), arXiv:1801.10130

[3] Esteves, Allen-Blanchette, Makadia, Daniilidis, Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV (2018), arXiv:1711.06721

[4] McEwen, Jason, Wallis, Christopher and Mavor-Parker, Augustine N., Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs, ICLR (2022), arXiv:2102.02828

[5] Bruna, Joan, and Stéphane Mallat, Invariant scattering convolution networks, IEEE Transaction on Pattern Analysis and Machine Intelligence (2013)

[6] Perlmutter, Michael, et al., Geometric wavelet scattering networks on compact Riemannian manifolds, Mathematical and Scientific Machine Learning. PMLR (2020), arXiv:1905.10448