Randomly Wired Neural Networks

Connor Shorten
Towards Data Science
5 min readSep 4, 2019

--

This blog post will briefly explain what Neural Architecture Search is and how it can help you achieve better modeling results with your dataset. Following is an argument for why you should ignore advanced algorithms and use a much simpler algorithm, randomly wired neural networks. This algorithm uses a random graph generation algorithm from graph theory and network science, fixing the same computation at each node to a 3x3 separable convolution and focusing on the data flow between nodes (conceptually similar to architectures such as ResNet and DenseNet).

Exploring Randomly Wired Neural Networks for Image Recognition by Saining Xe, Alexander Kirillov, Ross Girshick, Kaiming He

Neural Architecture Search (NAS) describes the paradigm shift from feature engineering → neural networks → automated neural network design. Armed with building blocks such as convolutions, pooling, and batch normalization, it is unclear how they fit together. You can imagine assembling these blocks to form a general architecture; powerful enough to extract useful features in every application ranging from image classification, object localization, semantic segmentation, and image generation. However, in the current state of Deep Learning, you are probably going to be better off finding a specialized architecture. Even in the subset of image classification problems, architectures further specialized to your dataset are likely to achieve the best results.

So how do we find Neural Architectures for our Datasets?

Generally, NAS algorithms and research papers either focus on Macro or Micro architectures, sometimes focusing on both (e.g. CoDeepNEAT). Micro architecture design describes techniques that design modular blocks of data flow and computation which fit into a Macro structure. Designing the Macro structure consists of decisions such as how many times to repeat these modular building blocks and how many variations of the blocks should exist. Macro architecture design lies in the gray zone between hyper-parameter optimization and NAS with decisions such as when to apply spatial downsampling and the number of feature maps from stage to stage, with stages describing a set of blocks maintaining the same spatial resolution (and sometimes feature dimension as well).

Nearly all of the popular NAS papers manually design a Macro architecture. They deploy either bayesian optimization, evolutionary algorithms, reinforcement learning, or differentiable architecture search to find the Micro / modular blocks to be integrated into the Macro framework. Enumerating all micro structures is computationally intractable, and thus many priors are enforced on the search algorithms. These priors are generally encoded into the discrete search space available to the algorithm. For example, in the NASNet space each node has a fixed input degree of 2 and there are 5 such nodes in the Micro architecture.

These priors / biases in the search space support a hypothesis that neural architectures are optimally configured based on utilization of different operations. This hypothesis would predict that something like an input node passed through a 3x3 convolution and later concatenated with the result of a zero-padded 2x2 max pool processing of the same input would be the most successful. Obviously this hypothesis envisions a much more complicated combination of operations, however I think this quick example communicates the idea.

This prior in the NAS search space reflects a preference of the Inception network design vs. the ResNet or DenseNet designs. The Inception / GoogLeNet passes the input block to separate processing blocks such as 3x3 max pooling and 1x1, 3x3, and 5x5 convolutions. All of the outputs from the blocks are then either element-wise summed up or concatenated along the feature axis to form the output of the micro cell.

The ResNet and DenseNets focus on Wiring or data flow between processing operations.

The ResNet uses a simple wiring pattern that achieved a breakthrough in image classification accuracy and allowed for the training of much deeper neural networks. The ResNet ‘skip connection’ takes the input from the previous layer (or Micro architecture block) (l-1) and sends it ahead to (l+1). The DenseNet uses a more intense wiring pattern sending all the previous inputs to the next layer. For example, layer (l+4) would receive inputs (l, l+1, l+2, and l+3).

Randomly Wired Neural Networks

The argument in this post is that Wiring > Connecting different Operations in the context of Neural Architecture Search. Randomly Wired Neural Networks use the same operation for every node, a 3x3 depthwise separable convolution. Rather than focusing on connecting operations such as convolutions with different filter sizes through clever pathways, these networks just randomly connect the same operation throughout the Micro architectures. The Macro architecture of the randomly wired network is designed by the authors of the paper. This includes decisions such as how many stages to spatially downsample the feature maps and how many nodes should be in each stage. These networks end up looking like the image below:

Exploring Randomly Wired Neural Networks for Image Recognition by Saining Xe, Alexander Kirillov, Ross Girshick, Kaiming He

These networks are quickly generated using random graph algorithms that use heuristics to control the distribution of node degree (number of connections) throughout the network. Most interestingly, one of these algorithms (the WS algorithm) is used in network science to model small-world networks. This describes the social phenomena where we are on average 6 hops away from each other in social networks. The paper cites inspiration in this structure from neuroscience, highlighting that the 300 neuron connectome of a nematode (worm) has this small-world structure as well.

Following is the ImageNet performance of these networks compared to NAS approaches which require enormous amounts of computation:

If you would like a full explanation of the Randomly Wired Neural Network paper and how the random graph is converted to a DAG / CNN, please check out the video below!

Thank you for reading this post on Randomly Wired Neural Networks! I hope that this convinced you that the wiring of neural networks is integral to the future of NAS. I additionally hope that you will adopt this algorithm to accelerate architecture search for your datasets and report the results!

--

--