Chestnut Xiaocha from the concave templeQuotation Report Public Number QbitAI
Machine learning depends on tuning? This idea is out of date.
The Google Brain team has released a new study:
Only rely onNeural network architectureThe searched network, without training, without tuning, can perform tasks directly.
Such a network is calledWANN, weights are unknowable neural networks.
It achieved 92% accuracy without training and weight adjustment on the MNIST digital classification task, which is comparable to the performance of the trained linear classifier.
apart fromSupervised learning,WANN can also be competent for many enhancements.Learning tasks.
One of the team members, David Ha, has posted the results on Twitter and has received more than 1,300 praises:
So let's take a look at the effect first.
Google Brain used WANN to handle three intensive learning tasks.
(Give each group of neurons share the same weight.)
First task,Cart-Pole Swing-Up.
This is a classic control mission, a slide rail, a trolley, and a pole on the car.
The car runs in the range of the slide rails, and the poles are shaken from the natural drooping state, and remain in the upright position without falling.
(This task is harder than pure Cart-Pole:
The initial position of the Cart-Pole pole is upright, and you don't need a car to shake it up, just keep it. )
The difficulty is reflected in that there is no way to solve it with a Linear Controller. The reward for each time step is based on the car to the rails.distanceAnd the pole swingingangle.
WANN's Best Network (Champion Network) looks like this:
It has performed well without training:
Best performingSharing weightIt gave the team a very satisfactory result: it reached equilibrium with only a few swings.
The second task,Bipedal Waker-v2.
A two-legged "biological"; to go forward on a randomly generated road, over the bulge, across the pit. How much to reward, just look at it from the start to the hang upHow long,as well asMotor torque cost(To encourage efficient exercise).
The movement of each leg is controlled by a hip joint and a knee joint. There are 24 inputs that will guide its movements: including “Radar”, the frontographic data of the probing, the speed of articulation felt by the proprio, and so on.
Compared to the first taskLow dimensional inputThe possible network connections here are more diverse:
Therefore, WANN needs to have a choice of how to route from input to output.
This high-dimensional task, WANN is also completed with quality.
You see, this is the best architecture to search out, which is a lot more complicated than the low-dimensional tasks just now:
It runs under the weight of -1.5, which is like this:
The third task,CarRacing-v0.
This is a top-down, pixel-based racing game.
A car is controlled by three consecutive commands: throttle, steering, and braking. The goal is to pass as many bricks as possible within the stipulated time. The track is randomly generated.
The researchers handed over the work of interpreting each pixel (Pixel Interpretation) to a pre-trained variational self-encoder (VAE) that compresses pixel representations into 16 potential dimensions.
This 16-dimensional is the dimension of the network input. The learned feature is used to detect WANN's ability to learn Abstract Associations, rather than coding explicit geometric relationships between different inputs.
This is WANN's best network, under the weight of -1.4 shared, untrained racing results:
Although the road went awkwardly, it rarely ran away.
And put the best networkFine tuningIf you don't need training, it will be smoother:
To sum up, inSimplicitywithModularityThe second and third tasks performed well. The two-legged controller used only 17 of the 25 possible inputs, ignoring the speed of many LIDAR sensors and knee joints.
The WANN architecture can not only complete tasks without training a single weight, but also only use210 network connections(Connections), an order of magnitude less than the 2804 connections used in the current State-of-the-Art model.
After completing the intensive study, the team aimed again.MNIST, extending WANN to the classification task of supervised learning.
An ordinary network, in the case of random initialization of parameters, the accuracy of MNIST may only be10%about.
The network architecture WANN, which is searched by the new method, runs with random weights, and the accuracy rate has exceeded80%;
If it is just mentioned, feed itCollection of multiple weights, the accuracy rate has been reached91. 6%.
In contrast, the fine-tuned weights bring 91.9% accuracy, and the trained weights can bring 94.2% accuracy.
Let's compare it, a linear classifier with thousands of weights:
It's just as good as WANN's lack of training, no fine-tuning, and just feeding some random weights.
The paper emphasizes that MINST handwritten digit classification isHigh-dimensional classification task. WANN is doing very well.
And there is no weight, it is better than other values, everyone is very balanced:So random weights are feasible.
However, each of the different networks formed by different weights has its own number that is good at distinguishing, so a WANN with multiple weights can be used as a Self-Contained Ensemble.
Principle of implementation
How do WANNs do without training weight parameters with extreme accuracy?
The neural network not only has the power to bias these parameters, but the topology of the network and the choice of the activation function will affect the final result.
Researchers at Google's brain questioned at the beginning of the paper: How important is the weighting parameter of a neural network compared to its architecture? The extent to which a neural network architecture can affect a given task without learning any weight parameters.
To this end, the researchers proposed a neural network architecture search method that does not require training weights to find the minimum neural network architecture for performing intensive learning tasks.
Google researchers have also used this method in the field of supervised learning, using only random weights to achieve a much higher accuracy on MNIST than random guessing.
Paper fromArchitecture search, Bayesian neural network, algorithmic information theory, network pruning, neuroscienceInspired by these theories.
In order to generate WANN, the influence of weight on the network must be minimized. Random sampling with weights can ensure that the final network is the product of architecture optimization, but it is too difficult to perform random sampling of weights in high-dimensional space.
The researchers took a “simple and rude” approach, forcing weight-sharing on ownership and reducing the number of weights to one. This efficient approximation can drive searches for better architectures.
Solved the problem of weight initialization, the next question is how to receive the search weight agnostic neural network. It is divided into four steps:
1. Create an initial minimum neural network topology group.
2. Evaluate each network with multiple rollouts and assign a different shared weight value to each rollout.
3. Sort the network according to performance and complexity.
4. Create new groups based on the highest ranked network topology and make probabilistic choices through competitive outcomes.
Then, the algorithm repeats from step 2, in which successively iteratively, weight agnostic topologies with increasing complexity are generated.
The operation for searching the neural network topology is affectedNeural evolutionary algorithmInspired by (NEAT). In NEAT, topology and weight values are optimized simultaneously, and researchers ignore weights and only perform topology searches.
The figure above shows the specific operation of the network topology space search:
At the beginning, the network is the leftmost minimal topology, with only some of the inputs and outputs connected.
Then, the network changes in three ways:
1,Insert node: Split an existing connection into a new node.
2,add connectionsConnect two previously disconnected nodes and add a new connection.
3.Change the activation functionReassign the activation function of the hidden node.
On the far right side of the graph, the possible activation functions, such as linear function, step function, sine-cosine function, ReLU and so on, with weights in the range of [2,2], are shown.
Weights remain important
Compared with the traditional fixed topology network, WANN can obtain better results by using a single random shared weight.
Although WANN achieves the best results in many tasks, WANN is not completely independent of the weight value, and sometimes fails when a single weight value is randomly assigned.
WANNNs work by encoding the relationship between input and output. Although the importance of weight is not high, their consistency, especially the symbolic consistency, is the key.
Another advantage of randomly sharing weights is that adjusting the impact of a single parameter becomes insignificant without using gradient-based methods.
The results of reinforcement learning task make the author consider the application scope of WANN method. They also tested the performance of WANN on MNIST, the basic task of image classification, and the results were not good when the weight was close to zero.
Reddit netizens questioned the results of WANN. For the case of random weight close to zero, the performance of the network is not good. First, the specific performance of the reinforcement learning experiment is that the car will run out of the limit.
In this regard, the author explains that when the weight tends to zero, the output of the network will also tend to zero, so it is difficult to achieve better performance in the later optimization.
Links to the original text: