Hypernetworks: Artificial intelligence builds itself

Can artificial intelligence train its own neural network? A new research paper shows what that future might look like.

To use an artificial intelligence for a particular task, researchers select

a network architecture,
a learning method
and train the neural network.

In recent years, different variants of an automated architecture search have found their way into this process. In this process, algorithms, deep learning systems, or graph networks search for suitable network architectures for a specific task.

2018, for example, researchers introduced a graph hyper network (GHN) that finds the best architecture for a task such as image analysis, starting from a set of candidate networks. Graph networks use graphs instead of sequentially arranged layers. The graphs consist of multiple nodes that are connected to each other.

In Mengye Ren's team's hypernetwork, a node typically represents an entire layer of a neural network, and the connections represent how those units are interconnected. The hypernetwork is trained in numerous runs, in which it tries out ever new network architectures for an output. Their performance serves as training feedback for the network.

Can hypernetworks predict parameters?

In a new paper, researchers from the University of Guelph, Vector Institute for AI, Canda CIFAR AI, McGill University, and Facebook AI Research are now building on Ren's work. The researchers are expanding the capabilities of the hypernetwork: Instead of exclusively predicting architectures, the so-called GHN-2 will also predict the parameters of neural networks.

Neural networks are usually initialized randomly at the beginning of their training, and the weights in the network are thus given random values. During AI training, these parameters are adjusted until the system performs its task satisfactorily. GHN-2 is designed to predict these parameters directly, thus eliminating or greatly shortening the learning process. For this purpose, GHN-2 was trained with a data set (DeepNets-1M) of one million different network architectures.

For training, GHN-2 initializes the parameters for some possible network architectures for image analysis. The image analysis systems are then tested with images.

But instead of using the test feedback to update the parameters of the image analysis system, the researchers instead directly update the parameters of the hypernetwork, which then reinitializes the parameters of the image analysis system. In this way, GHN-2 learns to initialize better and better as training progresses, and it does so for numerous architectural variants.

Recommendation

AI research

New Othello experiment supports the world model hypothesis for large language models

GHN-2 shortens the training process

The researchers tested GHN-2 for CIFAR-10 and ImageNet image analysis with 500 network architectures, including those not included in the training dataset. For CIFAR-10, architectures that GHN-2 already knows achieved 66.9 percent accuracy after initialization and without further training.

Networks that were completely trained without GHN-2 achieved 69.2 percent accuracy after 2,500 iterations. For previously unknown architectures such as ResNet-50, GHN-2 achieved an average of just under 60 percent accuracy.

In the ImageNet benchmark, GHN-2 performed significantly worse: On average, the accuracy was 27.2 percent. In individual cases, it achieved almost 50 percent. However, even newly trained systems need about 5,000 iterations for an accuracy of 25.6 percent. Both systems can be raised to the usual accuracies beyond 90 percent with further training.

Even though the hypernetwork cannot replace classical training yet, the results show that the approach works and already saves time and energy: GHN-2 predicts matching parameters in less than a second, even on a CPU. Comparable results in ImageNet, for example, could take several hours on a GPU.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

The researchers now want to expand the approach and train a hypernetwork with even more tasks, such as language processing and other architectures. In the long term, the project could make deep learning possible even for researchers without access to massive computing power, the team says.

Pre-trained GHN-2 models and the DeepNets-1M dataset are available on Github.

Hypernetworks: Artificial intelligence builds itself

Can hypernetworks predict parameters?

New Othello experiment supports the world model hypothesis for large language models

GHN-2 shortens the training process

Read more about Artificial Intelligence:

Why large AI language models don't lead to human-like AI

Meta PEER: Are large language models any good as writing assistants?

GLM-130B: The most capable AI language model currently available comes from China

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Hypernetworks: Artificial intelligence builds itself

Can hypernetworks predict parameters?

GHN-2 shortens the training process

Read more about Artificial Intelligence:

Share

Bank details