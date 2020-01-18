Generative Adversarial Networks (GANs) have been enthusiastically received by Ian Goodfellow and his research team since their inception in 2014. Yann LeCun, Facebook’s director of AI Research, even described GANs as “the most interesting idea in ML in the past 10 years”. With all this excitement, it can be easy to overlook the subtle variety of GANs. There are a number of different types of generative opposing networks, each of which functions slightly differently and helps engineers to achieve slightly different results.

In order to give you a deeper insight into GANs, in this article we deal with three different generative opposing networks: SRGANs, CycleGANs and InfoGANs. We will examine how these different GANs work and how they can be used. This should provide you with a solid foundation to study GANs in more detail and apply them to your own experiments and projects.

This article is an excerpt from the book, Deep learning with TensorFlow 2 and Keras, Second Edition by Antonio Gulli, Amita Kapoor and Sujit Pal.

SRGAN – Super Resolution GANs

Do you remember a thriller in which our hero asks the computer guy to enlarge the faded image of the crime scene? With the zoom, we can see the criminal’s face in detail, including the weapon used and everything engraved on it! Well, SRGAN can do similar magic.

Here, a GAN is trained so that it can produce a photo-realistic, high-resolution image with a low-resolution image. The SRGAN architecture consists of three neural networks: a very deep generator network, a discriminator network and a pre-trained VGG-16 network.

How do SRGANs work?

SRGANs use the Perceptual Loss function (developed by Johnson et al., Perceptual Losses for Real-Time Style Transfer and Super Resolution). The difference in feature card activations in high layers of a VGG network between the network output part and the high-resolution part includes the perceptual loss function. In addition to the loss of perception, the authors added the loss of content and the loss of the opponent so that the images created appear more natural and the finer details more artistic. Perception loss is defined as the weighted sum of content loss and controversial loss:

ISR = ISR X + 10-3 × ISRGen

The first term on the right is the loss of content that is obtained with the feature maps generated by VGG 19. Mathematically, this is the Euclidean distance between the feature map of the reconstructed image (this is the one generated by the generator) and the original high-resolution reference image.

The second term on the right is the controversial loss. This is the standard term for generative losses, which is intended to ensure that images generated by the generator can mislead the discriminator. In the following illustration from the original paper you can see that the image created by SRGAN comes much closer to the high resolution original image:

CycleGAN

Another notable architecture is CycleGAN. Proposed in 2017, it can take on the task of image translation. After training, you can translate an image from one domain to another. If you e.g. For example, if you train with a horse and zebra and create a picture with horses in the ground, the CycleGAN can convert the horses into zebras with the same background.

How does CycleGAN work?

Have you ever imagined what a landscape would look like if Van Gogh or Manet painted it? We have many landscapes and landscapes painted by Gogh / Manet, but we do not have a collection of input-output pairs. CycleGAN performs the image translation, ie an image that is specified in one domain (e.g. scenery) is transferred to another domain (e.g. Van Gogh painting of the same scene) if there are no exercises , CycleGAN’s ability to perform image translations even without training pairs makes it unique.

In order to achieve image translation, the authors of CycleGAN used a very simple, yet effective process. They used two GANs, with the generator of each GAN performing image translation from one domain to another.

Let us assume that the input is X, the generator of the first GAN performs a mapping G: X → Y, so its output would be Y = G (X). The generator of the second GAN performs an inverse mapping F: Y → X, which leads to X = F (Y). Each discriminator is trained to distinguish between real images and synthesized images. The idea is shown as follows:

In order to train the combined GANs, the authors added a forward cycle consistency loss (left figure) and a reverse cycle consistency loss (right figure) in addition to the conventional GAN ​​loss. This ensures that if an image X is given as input, the image obtained after the two translations F (G (X)) ~ X will be the same X (similarly, the reverse cycle consistency loss ensures the G (F (Y)) ~ Y).

Here are some of CycleGAN’s successful image translations:

Below are some more examples showing the translation of seasons (summer → winter), photo → painting and vice versa, horses → zebra:

InfoGAN

The GAN architectures we’ve considered so far give us little or no control over the images that are created. InfoGAN changes this; It provides control over various attributes of the generated images. The InfoGAN uses concepts from information theory, in which the noise expression is converted into latent codes, which enable a predictable and systematic control of the output.

How does InfoGAN work?

The generator in InfoGAN takes two inputs, the latency space Z and a latency code c, so the output of the generator is G (Z, c). The GAN is trained to maximize the mutual information between the latent code c and the generated image G (Z, c). The following figure shows the architecture of InfoGAN:

The chained vector (Z, c) is fed to the generator. Q (c | X) is also a neural network and, together with the generator that it generates, forms a map between the random noise Z and its latent code c_hat. It aims to estimate c given X. This becomes the objective function of the conventional GAN ​​by adding a regularization term:

minDmaxG VI (D, G) = VG (D, G) – λI (c; G (Z, c))

The term VG (D, G) is the loss function of conventional GAN, and the second term is the regularization term, where λ is a constant. Its value was set to 1 in the paper, and I (c; G (Z, c)) is the mutual information between the latent code c and the image G (Z, c) generated by the generator.

InfoGAN’s results for the MNIST data set are listed below:

This concludes our brief look at three different types of generative adversarial networks. The book from which this article originates can be found in the Packt store, or you can read the first chapter free of charge on the Packt subscription platform.