Can artificial intelligence train its own neural network? A new research paper shows what that future might look like.
To use an artificial intelligence for a particular task, researchers select
- a network architecture,
- a learning method
- and train the neural network.
In recent years, different variants of an automated architecture search have found their way into this process. In this process, algorithms, deep learning systems, or graph networks search for suitable network architectures for a specific task.
2018, for example, researchers introduced a graph hyper network (GHN) that finds the best architecture for a task such as image analysis, starting from a set of candidate networks. Graph networks use graphs instead of sequentially arranged layers. The graphs consist of multiple nodes that are connected to each other.
In Mengye Ren's team's hypernetwork, a node typically represents an entire layer of a neural network, and the connections represent how those units are interconnected. The hypernetwork is trained in numerous runs, in which it tries out ever new network architectures for an output. Their performance serves as training feedback for the network.
Can hypernetworks predict parameters?
In a new paper, researchers from the University of Guelph, Vector Institute for AI, Canda CIFAR AI, McGill University, and Facebook AI Research are now building on Ren's work. The researchers are expanding the capabilities of the hypernetwork: Instead of exclusively predicting architectures, the so-called GHN-2 will also predict the parameters of neural networks.
Neural networks are usually initialized randomly at the beginning of their training, and the weights in the network are thus given random values. During AI training, these parameters are adjusted until the system performs its task satisfactorily. GHN-2 is designed to predict these parameters directly, thus eliminating or greatly shortening the learning process. For this purpose, GHN-2 was trained with a data set (DeepNets-1M) of one million different network architectures.
For training, GHN-2 initializes the parameters for some possible network architectures for image analysis. The image analysis systems are then tested with images.
But instead of using the test feedback to update the parameters of the image analysis system, the researchers instead directly update the parameters of the hypernetwork, which then reinitializes the parameters of the image analysis system. In this way, GHN-2 learns to initialize better and better as training progresses, and it does so for numerous architectural variants.
GHN-2 shortens the training process
The researchers tested GHN-2 for CIFAR-10 and ImageNet image analysis with 500 network architectures, including those not included in the training dataset. For CIFAR-10, architectures that GHN-2 already knows achieved 66.9 percent accuracy after initialization and without further training.
Networks that were completely trained without GHN-2 achieved 69.2 percent accuracy after 2,500 iterations. For previously unknown architectures such as ResNet-50, GHN-2 achieved an average of just under 60 percent accuracy.
In the ImageNet benchmark, GHN-2 performed significantly worse: On average, the accuracy was 27.2 percent. In individual cases, it achieved almost 50 percent. However, even newly trained systems need about 5,000 iterations for an accuracy of 25.6 percent. Both systems can be raised to the usual accuracies beyond 90 percent with further training.
Even though the hypernetwork cannot replace classical training yet, the results show that the approach works and already saves time and energy: GHN-2 predicts matching parameters in less than a second, even on a CPU. Comparable results in ImageNet, for example, could take several hours on a GPU.
The researchers now want to expand the approach and train a hypernetwork with even more tasks, such as language processing and other architectures. In the long term, the project could make deep learning possible even for researchers without access to massive computing power, the team says.
Pre-trained GHN-2 models and the DeepNets-1M dataset are available on Github.