This supplemental material is mainly to specify the detailed configurations of the experiments mentioned in the paper and show more additional results. Running time statistics on uncontrolled texture synthesis are also listed in the end.
Before describing the settings, we first explain the parameter notations involved in our method.
sourceImageShortSize
: Size of shorter side of the source image. For an input source image, we resize it so that its shorter side has sourceImageShortSize
pixels. In particular, sourceImageShortSize=*
denotes no resize to the source image.
targetImageSize
: Size of the generated texture, in form of [targetHeight
,targetWidth
], indicating the height and width of the generated texture.
multiScales
: Scale factors used in the multi-resolution synthesis, formatted as [scale1
, scale2
, scale3
, ..., scaleN
]. In our method, we first down-sample the source and target image according to scale1
. The down-sampled target image is optimized on this scale till convergence, which is then up-sampled as the initialization to the next scale according to scales2
. The above process is repeated till it reaches the final level of scaleN
. As a proven strategy, multi-resolution synthesis leads to results of better structures and finer details.
iterationPerScales
: Maximum number of optimization iterations for each scale in the multi-resolution synthesis.
vggLayers
: The selected feature layers from VGG19.
patchSize
: Patch size used for patch sampling on the feature maps.
stride
: Stride used for patch sampling on the feature maps.
lambdaOcc
: Weight of occurrence penalty in the Guided Correspondence Distance.
lambdaGC[type]
: Weight of the distance term from guidance channels in the Guided Correspondence Distance.
h
: Bandwidth parameter in contextual similarity.
flip_augmentation
: Whether to use the flip copies as the augmentation of source image
rotate_augmentation
: Whether to use the rotated copies as the augmentation of source image.
For uncontrolled texture synthesis, we use the following configuration:
sourceImageShortSize=256, targetImageSize=[512,512], multiScales=[0.25, 0.5, 0.75, 1], iterationPerScales=500, vggLayers=['r11', 'r21', 'r31', 'r41'], patchSize=7, stride=3, lambdaOcc=0.05, h=0.5
.
As mentioned in the paper, we have collected 50 texture images from [Zhou et al. 2018] for experiments. And we have conducted both qualitative and quantitative comparisons to 5 existing approaches.
Here we show all the results of 50 examples produced by the 6 methods: Link for More Results 🖼️.
For annotation control, we use the following configuration:
sourceImageShortSize=256, targetImageSize=[512,512], multiScales=[1], iterationPerScales=2000, vggLayers=['r11', 'r21', 'r31', 'r41'], patchSize=7, stride=3, lambdaOcc=0.05, lambdaGC[anno]=10, h=0.5
.
Note that we didn't use the multi-scale synthesis for this task, as we want to make sure the texture follows the fine details specified in the target annotation.
Here are more results and comparisons on annotation control: Link for More Results 🖼️.
For progression control, we use the following configuration:
sourceImageShortSize=*, targetImageSize=[512,512], flip_augmentation=True, multiScales=[0.25, 0.5, 0.75, 1], iterationPerScales=500, vggLayers=['r21', 'r31', 'r41'], patchSize=7, stride=3, lambdaOcc=0.05, lambdaGC[prog]=10, h=0.5
.
For a better comparison, we didn't resize the example textures to match the settings of [Zhou et al. 2017]. Still, the target image size is set to [512, 512]. Considering the target progression map may differs a lot from the source progression, we flip the source image vertically and horizontally to enhance the patch diversity and get four augmented source image copies. All these source copies are then fed to VGG to extract features. Note that patch features corresponding to the same sample location on the original source texture (the coordinate of the flipped copy should also be flipped accordingly) are aggregated as an augmentation set to that source sample.
Here are more results and comparisons on progression control: Link for More Results 🖼️.
For orientation control, we use the following configuration:
sourceImageShortSize=*, targetImageSize=[512,512], rotate_augmentation=True, multiScales=[0.25, 0.5, 0.75, 1], iterationPerScales=500, vggLayers=['r21', 'r31', 'r41'], patchSize=7, stride=3, lambdaOcc=0.05, lambdaGC[orient]=5, h=0.5
.
To make the source patches account for possible orientation changes required in the target orientation field, we augment the source image with 8 rotated copies, where each is rotated by 45 degree. Again, patch features corresponding to the same sample location on the original source texture are aggregated as an augmentation set to that source sample.
Here are more results and comparisons on orientation control: Link for More Results 🖼️.
For two controls, we use the following configuration:
sourceImageShortSize=*, targetImageSize=[512,512], rotate_augmentation=True, multiScales=[0.25, 0.5, 0.75, 1], iterationPerScales=500, vggLayers=['r21', 'r31', 'r41'], patchSize=7, stride=3, lambdaOcc=0.05, lambdaGC[prog]=10, lambdaGC[orient]=1, h=0.5
.
The settings are almost the same as orientation control, where each source sample has an augmentation set of 8 rotated copies.
Here are more results and comparisons on two controls: Link for More Results 🖼️.
The Guided correspondence loss can be used to train feedforward networks. Here we describe how we train generative models for uncontrolled/controlled texture synthesis.
We use the Pytorch Implementation of TextureNets as the generator. We only modify the loss function from Gram to our Guided Correspondence loss. All other settings remain unchanged. The parameters in the Guided Correspondence loss are as follows:
sourceImageShortSize=256, targetImageSize=256, vggLayers=['r21', 'r31', 'r41'], patchSize=3, stride=2, lambdaOcc=0.05, h=0.5
.
Here are more results on real-time control: Link for More Results 🖼️.
Controlled synthesis based on conditional GANs are tested in this experiment. Specifically, we use the Official Code of SPADE as our generative model, where progression maps or orientation fields can be directly fed into its network as the condition.
As described in the paper, we train the SPADE by two stages. In the first stage of reconstruction training (the first 20k training iterations), all the original losses used in SPADE are kept unchanged. Then in the second stage of random synthesis training (the next 20k training iterations), we first generate a lot of random target progression or orientation maps using the code below.
xxxxxxxxxx
def data_augmentation(self, input_tensor, target_h, target_w):
# different augmentation strategy ------------------
choice = torch.multinomial(torch.tensor([0.1, 0.7, 0.2]), 1, replacement=True).item()
crop_size = self.opt.crop_size
if choice == 0:
#1: use source guidance perturbed by perlin noise as the target (10%)
output_tensor_list = []
for index_channel in range(input_tensor.shape[1]): # perturb the guidance per channel
curr_in_tensor = input_tensor[:, index_channel:index_channel + 1]
curr_out_tensor = progression_refinement(curr_in_tensor, curr_in_tensor)
output_tensor_list.append(curr_out_tensor)
output_tensor = torch.cat(output_tensor_list, 1)
elif choice == 1:
#2: use randomly resized source blocks as the target (70%)
input_h, input_w = input_tensor.shape[-2:]
res = random.randint(1, 0.5 * crop_size)
rand_y = random.randint(0, input_h - res)
rand_x = random.randint(0, input_w - res)
curr_out_tensor = input_tensor[:, :, rand_y:rand_y + res, rand_x:rand_x + res]
output_tensor = F.interpolate(curr_out_tensor, size=[target_h, target_w], mode='bilinear')
elif choice == 2:
#3: use pure perlin noise as the target (20%)
output_tensor_list = []
res_candidate = [2, 4, 8]
res = random.choice(res_candidate)
for index_channel in range(input_tensor.shape[1]):
perlin_numpy = generate_perlin_noise_2d([crop_size, crop_size], (res, res)) # perlin noise
curr_out_tensor = torch.from_numpy(perlin_numpy)[None, None, :, :].float()
curr_out_tensor = F.interpolate(curr_out_tensor, size=[target_h, target_w], mode='bilinear')
curr_in_tensor = input_tensor[:, index_channel:index_channel + 1] # post process
curr_out_tensor = progression_refinement(curr_out_tensor, curr_in_tensor)
output_tensor_list.append(curr_out_tensor)
output_tensor = torch.cat(output_tensor_list, 1)
return output_tensor
For this random synthesis training, we add the Guided Correspondence loss to complement the conditional-GAN loss. We set the weight balancing the two losses as lambda_gcd=5, lambda_gan=1
. The high weight of Guided Correspondence loss makes it serve as a strong regularizer for stablizing the training process.
And the configuration of the Guided Correspondence loss is as follows:
vggLayers=['r21', 'r31', 'r41'], patchSize=3, stride=2, lambdaOcc=0.05, h=0.5
.
In progression control, we set lambdaGC[prog]=5
. In orientation control, we set lambdaGC[orient]=1
.
Here are more results on real-time controlled synthesis: Link for Progression Control 🖼️ and Link for Orientation Control 🖼️.
For texture transfer, we use the following configuration:
sourceImageShortSize=256, targetImageShortSize=512, multiScales=[1], iterationPerScales=2000, vggLayers=['r11', 'r21', 'r31', 'r41'], patchSize=3, stride=2, lambdaOcc=0.05, lambdaContent=10, h=0.2
.
In this task, the Guided Correspondence loss is used as a general textural loss. Texture transfer is performed by incorporating the content loss proposed by [Gatys et al. 2015a]. We initialize the target image using the content image. The weight of content loss should be relatively large to preserve the structure of the content image.
For image inpainting ,we use the following configuration:
sourceImageShortSize=*, targetImageSize=*, multiScales=[0.25, 0.5, 0.75, 1], iterationPerScales=500, vggLayers=['r21', 'r31', 'r41'], patchSize=7, stride=3, lambdaOcc=0.05, h=0.5
.
Inpainting aims to fill holes in an image with source patches only from the remaining region of the same image. Note that the occurrence penalty is switched off for a more continuous color across the seams of hole borders in this experiment.
The average running time of different methods for uncontrolled texture synthesis is reported here. We randomly pick 10 images from our dataset, use the default settings for all methods, and record their running time (seconds):
METHOD | Self-tuning | CNNMRF | Sliced Wasserstein | TexExp | SinGAN | Ours |
---|---|---|---|---|---|---|
AVERAGE TIME (s) | 226.18 | 210.18 | 484.60 | (5160.71) | (6180.24) | 313.28 |
where ()
represents total time for training the network.