
This post is a paper review for educational purpose.
One of the most widely used networks in machine learning research is the generative adversarial network (GAN).
High quality image generation using GAN is being used in various applications. The figure below shows an example video of BigGAN that succeeded in creating a high-resolution video. Isn’t it realistic?!


Despite this marvelous generation performance of the generative model, many limitations still remain. For example, in the learning process of the generative model, the mode collapse phenomenon is not fundamentally improved.
Mode collapse refers to a phenomenon in which network training gets stuck in a local minima while going to the optimal point, that is, a global minima.
In this article, we will review the following paper that analyzes GAN’s mode collapse through global landscape analysis of the loss function.
Ruoyu Sun et al., Towards a Better Global Loss Landscape of GANs, NeurIPS 2020.
This paper mentions that the structure of the loss function of most GANs is bound to fall into the local minima. In addition, in this paper, they assume that the population loss formula of GAN, which basically assumes a probability distribution, can cast to an empirical loss formula (this assumption is actually valid, and can be extended back to the probability distribution case).
First of all, they starts to consider two-sample-based loss landscape analysis, and extend function space and parameter space. Let’s closely look at two-sample loss landscape of GAN.
In general, GAN learns by alternating discriminator (D) and generator (G). For example, as shown in the Figure 3, suppose that there are two modes of correct answers (original image) x1 and x2. As the discriminator’s decision boundary is updated, a better learning direction can be presented to the generator, but in fact, this direction causes the two generated images y1 and y2 to converge only near x1. In other words, we have no choice but to fall to the local minima!

Of course, not all cases will be like this. But if such a phenomenon occurs frequently even in the simple two-sample case, this problem will become more serious as the number of sample points are increased in high-dimensional space.
The authors analysis that each sample point is likely to fall into the local minima because the GAN loss function has a separable structure.
As a solution to this, they propose multiple decision boundary have to be considered as shown in Figure 4.

If the samples are “personalized” with each other, then the local minima problem will be alleviated as shown in Figure 4.

Let’s loot at the loss formula of Jensen-Shannon (JS) GAN and proposed relativistic pairing (Rp) GAN.
That is, JS-GAN is learned by encoding the original x and the generated image y through different projection functions h in Figure 5. On the other hand, RpGAN enables “personalization” of each sample because the two inputs x and y are encoded in pairs through the same function h.
For example, if h is defined as softplus function, RpGAN can be designed as follows.

Also, the authors show the global loss landscape by plotting the learning process of JS-GAN and RpGAN as shown in the following figure.

Figure 6 shows the learning progress from the initial state value of 0, where x and y are completely separated from each other, to the optimal loss value of GAN -log2 (about 0.6931). In case of JS-GAN, we can see the phenomenon of being trapped in the local minima at point s_1a, whereas RpGAN is flat in the local minima region of JS-GAN.
Finally, in this paper, they show additional analysis of extending few sample case to the function space and parameter space (we omitted because elaborated verification process is required).
The performance was verified through the toy dataset like CIFAR-10 and STL-10. However, insight of this paper can be utilized in many GAN literature and applications in the future.
References
[1] Sun, R., Fang, T., and Schwing, A. (2020). Towards a Better Global Loss Landscape of GANs. NeurIPS 2020 (paper link: https://proceedings.neurips.cc/paper/2020/file/738a6457be8432bab553e21b4235dd97-Paper.pdf)
[2] Brock, A., Donahue, J., and Simonyan, K. (2018). Large scale gan training for high fidelity natural image synthesis. ICLR 2019 (paper link: https://arxiv.org/pdf/1809.11096.pdf)