点击小眼睛开启蜘蛛网特效

利用pytorch实现神经网络风格迁移Neural Transfer

Oldpan 2017年12月5日 2条评论 9,692次阅读 0人点赞

风格迁移 Neural Transfer

《利用pytorch实现神经网络风格迁移Neural Transfer》

风格迁移，即获取两个图片（一张内容图片content-image、一张风格图片style-image），从而生成一张新的拥有style-image图像风格的内容图像。如上图，最右边的乌龟图像拥有了中间海浪图像的风格。

数学基础

《利用pytorch实现神经网络风格迁移Neural Transfer》

主要程序

图像载入

载入图像输入大小无要求，最终会被剪裁到相同大小，这是因为神经网络设计了一个特定的输入大小，因此内容图像和风格图像必须大小一致。

# desired size of the output image
imsize = 512 if use_cuda else 128  # use small size if no gpu

loader = transforms.Compose([
    transforms.Scale(imsize),  # scale imported image
    transforms.ToTensor()])  # transform it into a torch tensor

loader_new = transforms.Compose([ # 通过loader_new 可以将任意大小图像剪裁到相同大小
    transforms.Scale(imsize),
    transforms.RandomCrop(imsize),
    transforms.ToTensor()])


def image_loader(image_name):
    image = Image.open(image_name)
    image = Variable(loader(image))
    # fake batch dimension required to fit network's input dimensions
    image = image.unsqueeze(0)
    return image


style_img = image_loader("images/picasso.jpg").type(dtype)
content_img = image_loader("images/dancing.jpg").type(dtype)

assert style_img.size() == content_img.size(), \
    "we need to import style and content images of the same size"

导入的PIL图像的像素值范围是0-255,转化为torch.tensors的时候会变为0-1 。注意：在pytorch中训练好的网络是按照0-1的tensor来的。如果你将0-255的图像放入pytoch训练好的网络就没有任何效果。而对于Caffe是0-255，是可以使用的。

内容损失

《利用pytorch实现神经网络风格迁移Neural Transfer》

class ContentLoss(nn.Module):

    def __init__(self, target, weight):
        super(ContentLoss, self).__init__()
        # we 'detach' the target content from the tree used
        self.target = target.detach() * weight
        # to dynamically compute the gradient: this is a stated value,
        # not a variable. Otherwise the forward method of the criterion
        # will throw an error.
        # 因为这里只是需要target这个数值，这个数值是一种状态，不是Variable
        # 这里单纯将其当做常量对待，因此用了detach则在backward中计算梯度时不对target之前所在的计算图存在任何影响
        self.weight = weight
        self.criterion = nn.MSELoss()

    def forward(self, input):
        self.loss = self.criterion(input * self.weight, self.target)
        self.output = input
        return self.output

    def backward(self, retain_graph=True):
        self.loss.backward(retain_graph=retain_graph)
        return self.loss

GramMatric函数
gramMatric即相关矩阵函数的简化版，为了更快读更方便的计算。

class GramMatrix(nn.Module):

    def forward(self, input):
        a, b, c, d = input.size()  # a = batch size(=1)
        # b=number of feature maps
        # (c,d)=dimensions of a f. map (N=c*d)

        features = input.view(a * b, c * d)  # resise F_XL into \hat F_XL

        G = torch.mm(features, features.t())  # compute the gram product

        # we 'normalize' the values of the gram matrix
        # by dividing by the number of element in each feature maps.
        return G.div(a * b * c * d)

定义风格损失函数
######################################################################
#
# The longer is the feature maps dimension :math:`N`, the bigger are the
# values of the gram matrix. Therefore, if we don't normalize by :math:`N`,
# the loss computed at the first layers (before pooling layers) will have
# much more importance during the gradient descent. We dont want that,
# since the most interesting style features are in the deepest layers!
#
# 风格损失模块和内容模块几乎是一样的，但我们需要将gramMatrix加到类中
#

class StyleLoss(nn.Module):

    def __init__(self, target, weight):
        super(StyleLoss, self).__init__()
        self.target = target.detach() * weight
        self.weight = weight
        self.gram = GramMatrix()
        self.criterion = nn.MSELoss()

    def forward(self, input):
        self.output = input.clone()
        self.G = self.gram(input)
        self.G.mul_(self.weight)
        self.loss = self.criterion(self.G, self.target)
        return self.output

    def backward(self, retain_graph=True):
        self.loss.backward(retain_graph=retain_graph)
        return self.loss

定义神经网络

######################################################################
# A "Sequential" module contains an ordered list of child modules. For
# instance, "vgg19.features" contains a sequence (Conv2d, ReLU,
# Maxpool2d, Conv2d, ReLU...) aligned in the right order of depth. As we
# said in *Content loss* section, we wand to add our style and content
# loss modules as additive 'transparent' layers in our network, at desired
# depths. For that, we construct a new "Sequential" module, in wich we
# are going to add modules from "vgg19" and our loss modules in the
# right order:
# 根据VGG19构造一个和VGG19结构类似的神经网络，其中包括设计好的内容损失层和风格损失层
# 这两个层在对于在网络中的训练作用为0，我们需要的是图像在经过时产生的损失值。
#
# desired depth layers to compute style/content losses :
content_layers_default = ['conv_4']
style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']

def get_style_model_and_losses(cnn, style_img, content_img,
                               style_weight=1000, content_weight=1,
                               content_layers=content_layers_default,
                               style_layers=style_layers_default):
    cnn = copy.deepcopy(cnn)

    # just in order to have an iterable access to or list of content/syle
    # losses
    content_losses = []
    style_losses = []

    model = nn.Sequential()  # the new Sequential module network
    gram = GramMatrix()  # we need a gram module in order to compute style targets

    # move these modules to the GPU if possible:
    if use_cuda:
        model = model.cuda()
        gram = gram.cuda()

    i = 1
    for layer in list(cnn):
        if isinstance(layer, nn.Conv2d):
            name = "conv_" + str(i)
            model.add_module(name, layer)

            if name in content_layers:
                # add content loss:
                target = model(content_img).clone()
                content_loss = ContentLoss(target, content_weight)
                model.add_module("content_loss_" + str(i), content_loss)
                content_losses.append(content_loss)

            if name in style_layers:
                # add style loss:
                target_feature = model(style_img).clone()
                target_feature_gram = gram(target_feature)
                style_loss = StyleLoss(target_feature_gram, style_weight)
                model.add_module("style_loss_" + str(i), style_loss)
                style_losses.append(style_loss)

        if isinstance(layer, nn.ReLU):
            name = "relu_" + str(i)
            model.add_module(name, layer)

            if name in content_layers:
                # add content loss:
                target = model(content_img).clone()
                content_loss = ContentLoss(target, content_weight)
                model.add_module("content_loss_" + str(i), content_loss)
                content_losses.append(content_loss)

            if name in style_layers:
                # add style loss:
                target_feature = model(style_img).clone()
                target_feature_gram = gram(target_feature)
                style_loss = StyleLoss(target_feature_gram, style_weight)
                model.add_module("style_loss_" + str(i), style_loss)
                style_losses.append(style_loss)

            i += 1

        if isinstance(layer, nn.MaxPool2d):
            name = "pool_" + str(i)
            model.add_module(name, layer)  # ***

    return model, style_losses, content_losses

输入图像

######################################################################
# 输入图像
# ~~~~~~~~~~~
# 为了方便，输入图像为内容图像的copy，也可以创造一个白噪声图片

input_img = content_img.clone()
# if you want to use a white noise instead uncomment the below line:
# input_img = Variable(torch.randn(content_img.data.size())).type(dtype)

# add the original input image to the figure:
plt.figure()
imshow(input_img.data, title='Input Image')
定义优化
######################################################################
# 梯度下降
# ~~~~~~~~~~~~~~~~
#
# 这里我们使用L-BFGS算法来进行梯度下降，不同于训练一个网络，我们想要训练这个输入图片以降低内容/风格损失。我们就简单
# 创建一个python 的L-BFGS优化器，将输入图像当做变量来进行优化。但是optim.LBFGS()接受的第一个参数是一个Pytorch中包含需要进行梯度更新的Variable列表
# 我们的输入图像是一个Variable类型但不是计算树中的一部分。为了让这个函数知道输入图像这个Variable需要进行梯度计算。
# 一种可能的方法就是从输入图像中构造一个Parameter对象。然后我们只需要将其给了优化器的构造器即可。


def get_input_param_optimizer(input_img):
    # this line to show that input is a parameter that requires a gradient
    input_param = nn.Parameter(input_img.data)
    optimizer = optim.LBFGS([input_param])
    return input_param, optimizer

定义执行函数

######################################################################
# **Last step**: the loop of gradient descent. At each step, we must feed
# the network with the updated input in order to compute the new losses,
# we must run the "backward" methods of each loss to dynamically compute
# their gradients and perform the step of gradient descent. The optimizer
# requires as argument a "closure": a function that reevaluates the model
# and returns the loss.
# 最后一步：进行梯度下降的循环。每一步我们必须将更新后的数值输入到网络中去计算新的损失
# 在个损失中我们用backward方法来计算他们的梯度然后进行梯度下降，优化器需要一个功能函数来
# 作为一个闭环函数，这个闭环函数最后会返回应有的损失。
#
#
# However, there's a small catch. The optimized image may take its values
# between :math:`-\infty` and :math:`+\infty` instead of staying between 0
# and 1. In other words, the image might be well optimized and have absurd
# values. In fact, we must perform an optimization under constraints in
# order to keep having right vaues into our input image. There is a simple
# solution: at each step, to correct the image to maintain its values into
# the 0-1 interval.
#

def run_style_transfer(cnn, content_img, style_img, input_img, num_steps=300,
                       style_weight=1000, content_weight=1):
    """Run the style transfer."""
    print('Building the style transfer model..')
    model, style_losses, content_losses = get_style_model_and_losses(cnn,
        style_img, content_img, style_weight, content_weight)
    input_param, optimizer = get_input_param_optimizer(input_img)

    print('Optimizing..')
    run = [0]
    since = time.time()
    while run[0] <= num_steps:

        def closure():
            # correct the values of updated input image
            input_param.data.clamp_(0, 1)

            optimizer.zero_grad()
            model(input_param)    # 注意，在此过程中，只是在特定层中计算出了损失。
            style_score = 0
            content_score = 0

            for sl in style_losses:
                style_score += sl.backward()  # 在这里根据之前计算的风格损失进行梯度更新，也就是对图像进行更新。
            for cl in content_losses:
                content_score += cl.backward()  # 在这里根据之前计算的内容损失进行梯度更新，也就是对图像进行更新。

            run[0] += 1
            if run[0] % 50 == 0:
                print("run {}:".format(run))
                print('Style Loss : {:4f} Content Loss: {:4f}'.format(
                    style_score.data[0], content_score.data[0]))
                print()

            return style_score + content_score

        optimizer.step(closure)

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))

    # a last correction...
    input_param.data.clamp_(0, 1)

    return input_param.data

参考资料：

1、A Neural Algorithm of Artistic Style https://arxiv.org/abs/1508.06576
2、http://pytorch.org/tutorials/advanced/neural_style_tutorial.html

本篇文章采用署名-非商业性使用-禁止演绎 4.0 国际进行许可
转载请务必注明来源: https://oldpan.me/archives/pytorch-neural-transfer

关注Oldpan博客微信公众号，你最需要的及时推送给你。

风格迁移 Neural Transfer

风格迁移，即获取两个图片（一张内容图片content-image、一张风格图片style-image），从而生成一张新的拥有style-image图像风格的内容图像。如上图，最右边的乌龟图像拥有了中间海浪图像的风格。

数学基础

主要程序

内容损失

定义神经网络

输入图像

定义执行函数

参考资料：

猜你喜欢