
In this article I’ll try to summarize this very interesting paper related to Geometric Deep Learning
Original Paper
Gauge Equivariant Convolutional Networks and the Icosahedral CNN
NOTE: In this summary I have minimized the use for math notation (also because Medium is not math-friendly), so if you wanted to dig a bit more into the formalism please refer to the GitHub version (by the way, also GitHub does not render math natively but at least you can use Chrome plugin like Tex All The Things)
Intro
A Convolutional Filter is a Filter mapping from a Domain to a Co-domain
In CNN context,
- the filter is based on the operation of convolution
- both the domain and the codomain are called feature maps
Equivariance is a property regarding both a filter and its domain of application
- when a transformation is applied to the filter domain and it fully reflects on the filter output then the filter is called equivariant with respect to that transformation
- if the transformation gets fully absorbed by the filter, so that there is no trace of its application in the filter output, then the filter is called invariant with respect to that transformation
Symmetries identify a set of global transformations related to a specific space (they do not depend on any filter) which map the space in itself without altering its structure: they depend on the Geometry of the specific space.
The underlying idea related to the Classic CNN Layer acting on an Image or more generally a Planar Feature Map, is
- to be able to slide the filter over the plane the features belog to
or equivalently
- to slide the plane under it
hence more formally to use one of the plane symmetries (depending on its geometry) to transform the plane in itself before processing it again and again with Convolutional Filter
Generalizing this procedure on a generic space, we would like
- something the works the same way hence transforming the space in itself for continuous processing
- the Convolutional Filter should be learnable independently on the specific position it has on the plane (this is typically called weights sharing and it makes CNN Processing position independent)
In terms of space generalization, let’s consider a Manifold M, a space which is locally homeomorphic to an Euclidean Space
The main challenge of working with manifolds is they are not guaranteed to have global symmetries however they have local gauge transformations: the goal of the paper is then to design convolutional filters which do not aim at being equivariant to any global symmetry (as it is not possible to assume they exist in this context) but to local gauge transformations.
This is key as
- what we eventually want to obtain is a convolutional filter capable of weights sharing, which means learning weights independently of the specific position on the input feature map and
- the how is making it equivariant to the way the filter is applied to any point in the input space
-
in presence of global symmetries, the tool to reach any point in space to apply the filter is via the global symmetry transformations of the input space, for example in the case of a plane, this is the shift operation It is key to understand this "sliding" operation must not have an impact in the learning (otherwise the result of the learning will depend on the specific operation details), this operation just allows weights sharing.

So if an operator (e.g. convolution)is by design equivariant to this kind of transformations then it can be learned with weights sharing, as during the learning process the filter gets applied to any point in the space via symmetry transformations and it does not affect the learned weights (in fact in classic CNN on planes, the specific way the convolutional operator slides over the plane is not relevant).
The challenge approached in the paper is not to assume global symmetries, but just local gauge and study the related transformations so to design a convolutional operator able to be learned with weights sharing.
The first problem we have in absence of global symmetries is how to reach any point in space (this is a required feature). A possible solution is parallel transport but parallel transport is in fact path dependent.

Needless to say the path is something completely arbitrary and the fact it defines the local tangent frame, called gauge, means a naive filter would propagate this arbitrary gauge choice to its result, affecting the learning. ok
What we want in hence a filter which is invariant to this unavoidable arbitrary local gauge choice, stemming from parallel transport, so that the output feature map is not affected by local gauge chosen in the input feature map.

Details

Let’s then introduce the Gauge which is a Position Dependent Invertible Local Linear Mapping between the Manifold Tangent Space and the Euclidean Space
As a result of this, a Gauge also defines a Local Reference Frame in (mapping the Euclidean Reference Frame)
A Gauge Transformation is a Position Dependent Invertible Change of the Local Frame

The underlying idea is hence to build a filter with this property, depending on the specific manifold it has to process