Scaling Spherical Deep Learning to High-Resolution Input Data
Scattering networks on the sphere for scalable and rotationally equivariant spherical CNNs
Conventional spherical CNNs are not scalable to high resolution classification tasks. In this post we present spherical scattering layers — a novel spherical layer that reduces the dimensionality of the input data while retaining relevant information, while also being rotationally equivariant. Scattering networks work by employing predefined convolutional filters from wavelet analysis rather than learning convolutional filters from scratch. As the weights of scattering layers are designed rather than learned, scattering layers can be used as a one-time preprocessing step that reduces the resolution of the input data. We demonstrate empirically that spherical CNNs equipped with an initial scattering layer can scale to tens of megapixel resolutions, a feat that was previously intractable with conventional spherical CNN layers.
This blog post was co-authored with Augustine Mavor-Parker.
Previous Spherical Deep Learning Approaches are Computationally Demanding
Spherical CNNs [1, 2, 3] are extremely useful for a variety of problems in machine learning as many data sources cannot naturally be represented on a flat plane (see our previous article for an introduction). A key property of spherical CNNs is that they are equivariant to rotations of spherical data (we focus on rotationally equivariant approaches in this article). In practice, this means spherical CNNs have impressive generalisation properties that allow them to do things like classify 3D object meshes regardless of how they are rotated (and whether they have seen different rotations of the meshes during training).
We recently described a series of advances developed at Kagenova to improve the computational efficiency of spherical CNNs. Our approach — efficient generalised spherical CNNs — preserves the equivariance properties of previous spherical CNNs while being considerably more computationally efficient [1]. However, despite these advances in computional efficiency, spherical CNNs are still limited to relatively low-resolution data — meaning spherical CNNs cannot be applied to exciting applications that typically involve higher resolution data, such as cosmological data analysis and 360° computer vision for virtual reality. In a recent article we introduce spherical scattering layers networks to scale efficient generalised spherical CNNs higher resolutions [4], which we review in the current post.
Hybrid Approaches to Support High-Resolution Input Data
In developing efficient generalised spherical CNNs [1], we found that a hybrid approach to building spherical CNNs architectures was very effective. Hybrid spherical CNNs use different flavors of spherical CNN layers in the same network, allowing a practitioner to get the benefits of different types of layers at different stages of processing.
Scattering networks on the sphere continue with this hybrid approach and introduce a new kind of spherical CNN layer that can be plugged into existing spherical architectures. To scale efficient generalised spherical CNNs to higher dimensionalities, this new layer needs to be:
- Computationally scalable
- Mix information to low frequencies to allow subsequent layers to operate at low-resolution
- Rotationally equivariant
- Provide a stable and locally invariant representation (i.e. provide an effective representational space)
We identified scattering network layers as having the potential to satisfy all of these properties.
Scattering Networks on the Sphere
Scattering networks, first proposed in the Euclidean setting by Mallat [5], can be thought of as CNNs with fixed convolutional filters derived from wavelet analysis. Scattering networks have proven to be very useful for conventional (Euclidean) computer vision — especially in cases where data is limited and therefore learning convolutional filters is difficult. Here we briefly discuss the inner workings of scattering network layers, how they satisfy the requirements defined in the previous section, and how they can be developed for spherical data analysis.
Data processing within a scattering layer is performed by three basic operations. The first building block is a fixed wavelet convolution, which is similar to a normal learned convolution used in Euclidean CNNs. After the wavelet convolution, scattering networks apply a modulus non-linearity to the resulting representation. Lastly, scattering make use of a scaling function, which performs a form of local averaging and has some similaries to pooling layers in vanilla CNNs. Repeated application of these three building blocks scatters input data down a computational tree, with the resulting representations (analogous to CNN channels) being pulled out of the tree at different stages of processing. A schematic diagram of these operations is shown below.
The operations of a scattering network may seem slightly obscure from a traditional deep learning point of view. However, each of the computational operations described have a a specific purpose — designed to exploit solid theoretical results from wavelet analysis.
The wavelet convolutions in scattering networks have been carefully derived to extract relevant information from the input data. For example, in the case of natural images, wavelets are defined that specialise in extracting information related to edges at high frequencies and the general shapes of objects at lower frequencies. As a result, in the planar setting scattering network filters can have some similarity to traditional CNN filters. The same can apply in the spherical setting, where we use scale-discretised wavelets (see [4] for details).
As the wavelet filters are fixed, the initial scattering layers only need to be applied once, rather than repeatedly throughout training (like the initial layers in a traditional CNN). This makes scattering networks computationally scalable, satisfying requirement #1 above. Furthermore, scattering layers reduce the dimensionality of their input data, meaning only a limited amount of storage is required to cache scattering representations while training downstream CNN layers.
The modulus non-linearity is applied after wavelet convolutions. Firstly, this injects non-linearity into the network. Secondly, the modulus mixes high frequency information in the input signal to low frequencies, satisfying requirement #2 above. This is shown in the figure below, which show the frequency distribution of wavelet representations of data before and after the modulus non-linearity.
After the application of the modulus, the resulting signal is projected onto the scaling function. Scaling functions pick out low frequency information from the representation, similar to the operation of a pooling function in a traditional CNN.
We tested empirically the theoretical equivariance properties of spherical scattering networks. The test was done by rotating signals and feeding them through our scattering network and then comparing the resulting representations to those where input data is put through the scattering network and then rotated. In the table below we demonstrate that the equivariance error for a given depth is low, thus satisfying requirement #3 (typically in practice one does not go beyond a depth of two since most of the signal energy is already captured).
Lastly, it has been proved theoretically that Euclidean scattering networks are stable to small diffeomorphisms or distortions [5]. This result has been extended to scattering networks on compact Riemannian manifolds [6] and also specifically the sphere [4]. Stability to diffeomorphisms in practice means that the representation computed by a scattering network will not be dramatically different if a slight change to the input is made (see our previous post for a discussion of the role of stability in geometric deep learning). Consequently, scattering networks provide a well-behaved representational space on which subseuqent learning can proceed effectively, satisfying requirement #4 above.
Scalable and Rotationally Equivariant Spherical CNNs
Given that the scattering layers introduced satisfy all of our desired properties, we are now ready to integrate them into our hybrid spherical CNNs. As previously alluded, scattering layers can be bolted onto existing architectures as an initial preprocessing step to reduce the size of the representations that the following spherical layers process.
As the scattering networks have fixed representations for a given input, the scattering network layer can be applied to the whole data set once at the beginning of training, with the resulting low dimensional representations being cached for training the subsequent layers. Luckily the scattering representations are of a reduced dimensionality, meaning the disk space requirements for storing them are relatively low. Given this new spherical scattering layer, efficient generalised spherical CNNs are ready to be scaled to high-resolution classification problems.
Classifying the Anisotropies of the Cosmic Microwave Background
How is matter distributed throughout the universe? This is a fundamental research question for cosmologists that has major implications for theoretical models of our Universe’s genesis and evolution. Cosmic microwave background (CMB) radiation — remnant energy from the big bang — charts the distribution of matter throughout the universe. Cosmologists observe the CMB on the celestial sphere, which calls for computational methods that can perform cosmological analysis natively on the sphere.
Cosmologists are interested in methods for analysing the CMB that are capable of detecting non-Gaussiantiy in the distribution of the CMB throughout space, which can have important implications for theories of the very early Universe. Such analysis methods also need to be able to scale to literally astronomical resolutions. We demonstrated our scattering-based networks are able to meet these requirements by classifying CMB simulations as Gaussian or non-Gaussian at a resolution of L=1024. Scattering-based networks successfully classify these simulations at an accuracy of 95.3% — a much better result than the 53.1% of a lower-resolution conventional spherical CNN.
Summary
Spherical scattering layers compress the dimensionality of their input representations while retaining important information for downstream tasks. We have demonstrated this makes scattering layers extremely useful for spherical classification tasks at high resolutions. This opens up a plethora of potential applications that were previously intractable such as cosmological data analysis and classification of high resolution 360 images/videos. However, many computer vision problems require dense predictions — such as segmentation or depth estimation — necessitating high dimensional outputs as well as high dimensional inputs. Developing tractable spherical CNN layers that can increase the dimensionality of their output representations, while also preserving equivariance, is the subject of current research at Kagenova that will be presented in an upcoming post.
References
[1] Cobb, Wallis, Mavor-Parker, Marignier, Price, d’Avezac, McEwen, Efficient Generalised Spherical CNNs, ICLR (2021), arXiv:2010.11661
[2] Cohen, Geiger, Koehler, Welling, Spherical CNNs, ICLR (2018), arXiv:1801.10130
[3] Esteves, Allen-Blanchette, Makadia, Daniilidis, Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV (2018), arXiv:1711.06721
[4] McEwen, Jason, Wallis, Christopher and Mavor-Parker, Augustine N., Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs, ICLR (2022), arXiv:2102.02828
[5] Bruna, Joan, and Stéphane Mallat, Invariant scattering convolution networks, IEEE Transaction on Pattern Analysis and Machine Intelligence (2013)
[6] Perlmutter, Michael, et al., Geometric wavelet scattering networks on compact Riemannian manifolds, Mathematical and Scientific Machine Learning. PMLR (2020), arXiv:1905.10448