Image to image translation

https://arxiv.org/abs/1512.04150 https://arxiv.org/abs/2104.01538

Multimodal Unsupervised Image-to-Image Translation

Purpose

While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. (Abstract)

Method

MUNIT Method

Method overview

Loss function: Total Loss

Functional summary: Given an input image and sample a style variable $z$ from Gaussian distribution, the model gives target domain image with the content depending on the input image and the style depending on the style variable. Different $z$ will give different output image in target domain. But this method still limited to one domain to domain. If multiple domain is wanted, the conditional generation should be applied. (maybe some other work solves this problem).

Network Structure