The University of Tokyo team has open sourced an image editing tool called "neural collage"

/ 2019-04-24 / News

Teach a novice to paint? Font style migration? Change the star & ldquo; fake face & rdquo;? Undoubtedly, in the image generation, GAN has a huge potential for generating images with false images.

Recently, teams from the University of Tokyo and Preferred Networks opened up an image editing tool called "neural collage" that allows users to develop images. Position changes the semantic information of the image to achieve the effect of image collage.

For example, change the face of a savage husky into a cute Pomeranian.

It is worth mentioning that the middle of the whole process is very simple.

How to use and install?

First ensure that Python 3.6+ locales are installed and installed The required Python library: pip install -r requirements.txt

If you want to generate an image using a pre-trained model, the project author should provide a link to download the model. Note that the snapshot parameter is set to the path of the downloaded pre-trained model file (.npz).

Related Links:

The author says that the GAN model is based on Two new strategies: sCBN and feature blending, and the principle and implementation of the method are detailed in the paper "Spatially Controllable Image Synthesis with Internal Representation Collaging".


The following is an interpretation of the paper:


this article proposes An image editing strategy based on the Convolutional Neural Network (CNN) model is proposed. This novel method transforms the semantic information of any region of the image by characterizing the image generated by the GAN model.

This strategy can be combined with any GAN model with conditional normalization layers for image editing of artificial and real images. It has the following two variants:

(1) sCBN (spatial conditional batch normalization), which is a conditional batch based on user-specified spatial weight mapping Regularization method;

(2) Feature-blending, a method of directly modifying the intermediate feature map. In addition, the effectiveness and powerful performance of the proposed method are further verified by combining experiments with different GAN models on different data sets.


Deep-generation models, such as the generation of confrontation networks (GANs) and variable-point self-encoders (VAEs), are potential unsupervised learning techniques. Has strong semantic information representation capabilities.

Among them, GAN is particularly successful in image tasks, such as image coloring, image restoration, domain transformation, style migration, target deformation Such applications.

With the successive development of methods for stabilizing GAN models, such models have also been widely used in image generation.

However, how to regularize the GAN model according to the user's wishes and get the desired output is still a major problem in the current related field.

Previous research,For example, Conditional Generation Against Networks (CGAN), InfoGAN, Style Generation, etc. are all exploring how to generate a desired image for the generation of a confrontational network.

The recently proposed GAN dissection study explores the relationship between model output and intermediate feature semantic information, and successfully implements realistic images through reasoning relationships. Generation.

Inspired by this, this paper proposes a novel image transformation method, namely sCBN and feature blending strategy, to edit the image by processing the intermediate features of the generated network image. And allows the user to copy and paste the image semantic information and other editing operations.

where sCBN is based on a spatial map of the user-specified blending factor (label collaging), allowing the user to fuse the semantics of multiple tags.

In this way, not only can an image be generated from a label map, but also the image can be changed by local image semantics.

As shown in Figure 1a below, this method can turn a husky eye into the eyes of a Pomeranian.

Feature blending can directly blend multiple images in the intermediate feature space, and can also locally blend complex features; in Figure 1b, an animal is blended by features The posture becomes the posture defined by the model.

Figure 1 Samples of feature tiles obtained by sCBN method (a) and feature blending method (b).

Generally, one of the methods The big advantage is that you only need to train the GAN model of AdaIN or CBN structure, no need to train other models.

It can be used for any generated by GAN model. The image is suitable for a wide range of image semantic operations. In addition, by combining with Manifold projection, this method can edit the local semantic information of real images and demonstrate powerful performance in a large number of experiments.



sCBN is a special form of conditional batch regularization (CBN), a variant of the batch regularization method (BN), which encodes the parameters of the BN by class semantic information. It changes the conditional batch regularization parameters by spatial transformation, as shown in Figure 2.

Figure 2 Comparison of the structure of the CBN method and the sCBN method. On the left is the CBN method, which passes through the space. Consistent length, layer-by-layer add category features to the generated image.

The right image is the sCBN method, each layer of the method will specify the user-specified blending density and class Features are blended into the generated image.

Based on a single category of image samples, the CBN method regularizes the middle by the class-specific scale and bias parameters of a particular class. Feature set.

sCBN replaces the scaled term in the CBN method with a form of weighted sum, which is a non-negative sheet of the blending factor The quantity map is composed, which is determined by the user.

In this way, the user can determine the feature density of a certain category c in any region by using the selected weight coefficient to achieve the purpose of controlling the generated output.

In addition, by selecting the weight values used to control the feature density of different categories in different regions of the image, the user can classify multiple disjoint parts of the image.

Space Feature Mix

Spatial feature blending is an extractable A method of image-specific area features and blending them with other features.

Similar to the weighting factor in the sCBN method, the user can also control the blended effect by selecting the feature blending parameter M.

In addition, through the manifold projection transformation, the method can also be used for the editing of real images, as shown in Figure 3, through the feature blending process, the image The mouth features of G(z2) and G(z1) are mixed. The user only needs to select the mixing coefficient M of the specific area of the mouth to select this effect.

Figure 3 spatial feature blending method, through continuous iterative process, Generate the feature space of the network and mix the images generated by different hidden variables into the target image.

Real Image Application

By finding a manifold casting method, the hidden variable z satisfies G(z) and x are roughly equal, and the semantic information of the real image is edited.

After getting the reciprocal of x, you can change the partial label information of x or mix other image features into x by applying the same process.

The actual image editing process is shown in Figure 4 below. In the final step of the image transformation, a post-processing step of Poisson blending is used here.

This is mainly because the GAN model does not have the ability to decouple image background information, and Poisson blending can remove some artifacts from the region of interest.

Figure 4 Feature Space Tile Algorithm The process applied to the real image: the user needs to specify the hybrid map, select the feature space collage method, and use the mask for the Poisson blending process during post processing.

Figure 5 below shows an example of image reconstruction for different categories of conditions.

Figure 5 An example of image reconstruction of multiple category labels by manifold casting. The red frame image is passed The original category label is used to reconstruct the image.

Figure 6 below shows an example of the application of the two methods on the real image. The left side is the sCBN method on the real image. The result on the right is the result of the feature blending method.

Figure 6 sCBN and feature blending method applied to the image. /p>


Result Analysis

Here, combine the proposed method with the DCGAN model and verify the validity of the method in multiple different image data sets.

In addition, In order to verify the ability of the manifold projection and the DCGAN model to be characterized, a series of non-spatial transformation ablation experiments are also performed here.

Figure 7 below shows an example of a label collage using the sCBN method. As you can see, this method can adjust the global information of the image (such as face, shape) and local information (such as color, texture) without Destroy the semantic consistency of the image.

Figure 7  The label collage result of the sCBN method, where the area surrounded by the red line is translated into the target label.

Figure 8 shows the result of label tiling using feature blending. It can be seen that this method successfully modifies the semantic segmentation of the image without destroying the quality of the original image.

This method is robust to the semantic alignment of the transformation area.

Figure 8 The result of the label blending of the feature blending method, in which the features in the red frame area are mixed Go to the base image.

Collage effect per layer

Through a series of ablation The study explores the effects of each layer of modification in the model. Figure 9 below shows the results of the sCBN method applied to (1) all layers, (2) closest to the input layer, and (3) all layers except the first layer.

can see,The closer to the z-layer, the more obvious the effect of the method on global features; the closer to the x-layer, the more significant the effect of the sCBN method on local features.

Figure 9 is a collage effect on different layers. From top to bottom are the results of the sCBN method acting on different layers.

Similarly, the feature blending method shown in Figure 10 below is applied to different layers with different blending weights (l=1, 2, 3, 4 ) The results obtained.

It can be seen that when used in the first layer, global features will be affected and local features will be preserved.When the method is applied close to the x layer, the result is reversed.

Therefore, the user can select the blending weight coefficients more finely to control the local feature transition and its density as needed.

Figure 10 The result of the feature blending method acting on different layers

Real image conversion

In order to quantitatively evaluate the performance of the method in real image transformation by classification accuracy and human perception testing, the sCBN method is applied to the image in the ImageNet dataset.And carry out (1) cat→ big cat, (2) cat→ dog and (3) dog→

Subsequently, based on UNIT and MUNIT, the method proposed in this paper is compared with the analysis. The results are shown in Figure 11.

It can be seen that in terms of top-5 error rate, the method performs better than the other two benchmarks, which also verifies that it is in real image transformation. The effectiveness of the aspect.

Figure 11 top-5 classification error rate results


This paper proposes a A novel and effective image editing strategy, through the sCBN and feature blending method, processes the image feature representation to achieve the purpose of modifying semantic information and editing images.

The conditional regularization method can not only process category conditions, but also process other information, which can be applied to a wider range of non-image datasets in future research.

However, some shortcomings are still found in the research: the generation network with limited expression ability, especially in combination with manifold projection for processing real image transformation, the related problems in future research are still worth exploring.

Relevant articles recommend

Top suppliers

Response Time


Response Rate


Response Time


Response Rate


Product:Electronic Components,BOM,PCB,Development Board,Cooling fans/radiator

Response Time


Response Rate


Product:IC, MCU, passive components, diodes, transistor

Response Time


Response Rate



Response Time


Response Rate


Product:SMD Diode, SMD Transister, Tantalum Capacitor, Relay, SMD Resistor, SMD Capacitor,SMD Inductor, Patentiometer, Magnetic Bead

Response Time


Response Rate


Subscribe News

Enter your email address:

Contact Us


One to One Customer Service