点击小眼睛开启蜘蛛网特效

利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测

Oldpan 2018年12月6日 89条评论 199,509次阅读 30人点赞

《利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测》

本篇使用的平台为Ubuntu，Windows平台的请看Pytorch的C++端(libtorch)在Windows中的使用

前言

距离发布Pytorch-1.0-Preview版的发布已经有两个多月，Pytorch-1.0最瞩目的功能就是生产的大力支持，推出了C++版本的生态端(FB之前已经在Detectron进行了实验)，包括C++前端和C++模型编译工具。

对于我们来说，之后如果想要部署深度学习应用的时候，只需要在Python端利用Pytorch进行训练，然后使用torch.jit导出我们训练好的模型，再利用C++端的Pytorch读取进行预测即可，当然C++端的Pytorch也是可以进行训练的。

因为我们使用的C++版的Pytorch实际上为编译好的动态链接库和头文件，官方提供已经编译好的下载包:

《利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测》

之后我们将其称之为libtorch，官方对此有个简单的小教程：https://pytorch.org/tutorials/advanced/cpp_export.html

通过这个小教程我们可以了解到这个库的基本用法。

下图是利用Libtorch + OpenCV-4.0.0在GPU端进行的预测(简单识别手势)，所使用的语言为C++，相较python版本的预测速度提升10%。

《利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测》

好了，废话不多少，接下来聊聊如何使用它吧~

正式开始

Pytorch-1.0已经发布两个月了，为什么今天才进行尝试呢——原因很简单，个人比较担心其接口的不稳定性，故稍微多等乐些时间再进行尝试。虽然多等了，但是资料依然很是匮乏，官方的相关教程少之可怜，唯一参考信息的获取只有少数的博客和github上的issue了。

但是有一点好消息，相比于之前，现在尝试libtorch已经几乎没什么问题了，各方面都已经完善，如果大家对libtorch感兴趣，那么这篇文章就比较适合你啦~

另外还有个消息，Pytorch-1.0的稳定版将在这个星期五发布，也就是明天：
《利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测》

这样下来，libtorch的接口已经基本稳定，剩下的就让我们感觉尝尝鲜吧。

获取libtorch

获取libtorch的方式有两种：

从官网下载最新的编译好的文件：https://pytorch.org/cppdocs/installing.html
自己进行源码编译

我这里推荐第二种，因为官方编译好的版本为了兼容性，选择了旧式的C++-ABI(相关链接：https://github.com/pytorch/pytorch/issues/13541 ; https://discuss.pytorch.org/t/issues-linking-with-libtorch-c-11-abi/29510)，如果你使用的gcc版本>5，那么如果你将libtorch与其他编译好的库(使用gcc-5以及以上)进行联合编译，很有可能出现冲突，为了避免环境上面的问题，建议自己对源码进行编译。当然大家也可以测试下官方的

当然还有一点需要说明，如果你仅仅只单独使用libtorch库(从官方下载，并没有链接其他库，例如opencv)，那么你这样编译那么是没有任何问题的。大家可以直接下载官方编译好的包进行快速尝试。

源码编译

源码编译的前提步骤可以参考官方教程：https://github.com/pytorch/pytorch 和 Pytorch-0.4.1-cuda9.1-linux源码安装指南。

安装好所有的依赖件后，我们下载好官方的源码，然后进入Pytorch源码目录环境执行：

git submodule update --init --recursive  # 执行更新第三方库，确保安装成功
mkdir build
cd build
python ../tools/build_libtorch.py

有个ISSUE提到必须将源码目录中tools/build_pytorch_libs.sh第127行左右添加一句(-D_GLIBCXX_USE_CXX11_ABI=1)再进行编译:

THIRD_PARTY_DIR="$BASE_DIR/third_party"

C_FLAGS=""  # 添加上  -D_GLIBCXX_USE_CXX11_ABI=1.
# Workaround OpenMPI build failure
# ImportError: /build/pytorch-0.2.0/.pybuild/pythonX.Y_3.6/build/torch/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3MPI8Datatype4FreeEv
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686926
C_FLAGS="${C_FLAGS} -DOMPI_SKIP_MPICXX=1"
LDFLAGS=""

这个其实并不需要，我们直接编译即可。

这一部其实类似于Pytorch的源码编译，至于其中的细节(cuda、cudnn版本)这里不进行赘述了，大家可以查阅本站相关内页或者根据网上教程来进行安装：

相关内容：
CUDA,CUDNN工具箱多版本安装、多版本切换

如果编译无错之后我们会看到输出信息：

-- Install configuration: "Release"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libmkldnn.so.0.14.0" to "$ORIGIN:/home/prototype/anaconda3/envs/fastai/lib"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libc10.so" to "$ORIGIN"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libc10_cuda.so" to "$ORIGIN"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libcaffe2.so" to "$ORIGIN:/usr/lib/openmpi/lib:/usr/local/cuda/lib64:/home/prototype/anaconda3/envs/fastai/lib"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libcaffe2_gpu.so" to "$ORIGIN:/usr/local/cuda/lib64:/home/prototype/anaconda3/envs/fastai/lib:/usr/lib/openmpi/lib"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libtorch.so.1" to "$ORIGIN:/usr/local/cuda/lib64:/home/prototype/anaconda3/envs/fastai/lib"
-- Set runtime path of "/home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libcaffe2_module_test_dynamic.so" to "$ORIGIN:/home/prototype/anaconda3/envs/fastai/lib"

编译好之后的libtorch在path/to/pytorch/torch/lib/中，但要注意，实际我们在cmake中添加查找lib位置的路径为/pytorch/torch/share/cmake。

~~我们之后在cmake时需要添加-DCMAKE_PREFIX_PATH=/path/to/pytorch/torch/lib/tmp_install引入libtorch路径。~~

注意：在最新版的Pytorch-1.0.1中(经测试也适合1.0-1.3)，默认libtorch编译好的文件路径有所改变，我们应该这样添加 -DCMAKE_PREFIX_PATH=path/to/pytorch/torch/share/cmake

不懂什么是Cmake的可以看这里：编译器gcc、clang、make、cmake辨析

简单测试libtorch是否正常工作

这里进行一个简单的测试，测试我们导出的模型在python端和C++端是否一致，其中model的输入为(n,3,224,224)的tensor，输出为(3)的tensor，预测三个类别，首先我们在python端导出这个模型权重：

import torch
from Models.MobileNetv2 import mobilenetv2

model = mobildnetv2(pretrained)
example = torch.rand(1, 3, 224, 224).cuda() # 注意，我这里导出的是CUDA版的模型，因为我的模型是在GPU中进行训练的
model = model.eval()

traced_script_module = torch.jit.trace(model, example)
output = traced_script_module(torch.ones(1,3,224,224).cuda())
traced_script_module.save('mobilenetv2-trace.pt')
print(output)

此时打印出输出结果：

tensor([[ -1.2374, -96.6268,  19.2590]], device='cuda:0',
       grad_fn=<AddBackward0>)

上述导出的’mobilenetv2-trace.pt‘的链接：https://pan.baidu.com/s/1neHRHypYq9vbGDlY1WwfJw 提取码：sym8

然后，我们下载官方或者自己编译好libtorch，并且知道其所在的地址:path/to/libtorch（这只是例子，具体地址每个人不同）。然后编写我们的CmakeLists文件，其中find_package作用为根据我们提供的地址，去寻找libtorch的TorchConfig.cmake从而将整个libtorch库添加到我们的整体文件中：

cmake_minimum_required(VERSION 3.0.0 FATAL_ERROR)
project(simnet)

find_package(Torch REQUIRED)

message(STATUS "Pytorch status:")
message(STATUS "    libraries: ${TORCH_LIBRARIES}")

add_executable(simnet test.cpp)
target_link_libraries(simnet ${TORCH_LIBRARIES})
set_property(TARGET simnet PROPERTY CXX_STANDARD 11)

然后编写我们的C++端的Pytorch，简单读取权重信息然后创建一个tensor输入权重模型再打印出结果：

#include "torch/script.h"
#include "torch/torch.h"

#include <iostream>
#include <memory>

using namespace std;

int main(int argc, const char* argv[])
{
    if (argc != 2) {
        std::cerr << "usage: example-app <path-to-exported-script-module>\n";
        return -1;
    }

    // 读取我们的权重信息
    // 如果是1.1版本及以下: std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);

    // 如果是1.2版本及以上:
    torch::jit::script::Module module;
    try {
        module = torch::jit::load(argv[1]);
    }
    catch (const c10::Error& e) {
        std::cerr << "error loading the model\n";
        return -1;
    }


    module->to(at::kCUDA);

    assert(module != nullptr);
    std::cout << "ok\n";

    // 建立一个输入，维度为(1,3,224,224)，并移动至cuda
    std::vector<torch::jit::IValue> inputs;
    inputs.push_back(torch::ones({1, 3, 224, 224}).to(at::kCUDA));

// Execute the model and turn its output into a tensor.
    at::Tensor output = module->forward(inputs).toTensor();

    std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
}

我们编译此代码然后读取之前导出的模型，可以发现此时输出：

ok
 -1.2374 -96.6271  19.2592
[ Variable[CUDAFloatType]{1,3} ]

通过与之前tensor([[ -1.2374, -96.6268, 19.2590]], device='cuda:0',grad_fn=<AddBackward0>)进行对比，发现在小数点第三位出略有差别，但总体来说差别不是很大。

注意，两次读取都是在GPU中进行的，我们需要注意下，利用CPU和利用GPU训练的模型是不同的，如果导出使用GPU训练的模型(利用model.cpu()将模型移动到CPU中导出)然后使用CPU去读取，结果并不正确，必须保证导出和读取的设备一致。

如果使用的libtorch和导出的模型版本不匹配(这个错误经常出现于我们编译libtorch的版本和导出模型的Pytorch版本不同)则会出现这个错误(这个问题可能会在API稳定后解决)：

(simnet:7105): GStreamer-CRITICAL **: gst_element_get_state: assertion 'GST_IS_ELEMENT (element)' failed
terminate called after throwing an instance of 'c10::Error'
  what():  memcmp("PYTORCH1", buf, kMagicValueLength) != 0 ASSERT FAILED at /home/prototype/Downloads/pytorch/caffe2/serialize/inline_container.cc:75, please report a bug to PyTorch. File is an unsupported archive format from the preview release. (PyTorchStreamReader at /home/prototype/Downloads/pytorch/caffe2/serialize/inline_container.cc:75)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6c (0x7f92b7e7cf1c in /home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libc10.so)
frame #1: torch::jit::PyTorchStreamReader::PyTorchStreamReader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::istream*) + 0x6fc (0x7f92ca49a88c in /home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libcaffe2.so)
frame #2: torch::jit::load(std::istream&) + 0x2c5 (0x7f92cd9619f5 in /home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libtorch.so.1)
frame #3: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x55 (0x7f92cd961c15 in /home/prototype/Downloads/pytorch/torch/lib/tmp_install/lib/libtorch.so.1)
frame #4: /home/prototype/CLionProjects/simnet/cmake-build-release/simnet() [0x404f60]
frame #5: __libc_start_main + 0xf0 (0x7f92b4701830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: /home/prototype/CLionProjects/simnet/cmake-build-release/simnet() [0x407739]

利用OpenCV读取图像传递给libtorch进行预测

这样，我们已经初步使用了libtorch进行了测试，但是实际上我们需要图像库来读取图像或者视频，然后将其转化为Tensor再输入模型进行预测，这时我们就需要将libtorch与其他的库进行联合编译。

这里我们将OpenCV和libtorch一起编译，实现通过OpenCV开启摄像头将帧转化为tensor进行实时的预测，并判断当前的手势。

编译OpenCV

这里我们仍然推荐在当前的环境下(cmake、make、gcc版本确定情况下)编译自己的OpenCV，如果自己之前已经编译好可以跳过这一步。

至于如何编译OpenCV，可以看这里：Ubuntu下源码安装Opencv完全指南

与OpenCV联合编译

自己环境中存在OpenCV的前提下，同样使用Cmake的find_package命令可以找到，为此，我们修改CmakeLists文件为：

cmake_minimum_required(VERSION 3.12 FATAL_ERROR)
project(simnet)

find_package(Torch REQUIRED)        # 查找libtorch
find_package(OpenCV REQUIRED)       # 查找OpenCV

if(NOT Torch_FOUND)
    message(FATAL_ERROR "Pytorch Not Found!")
endif(NOT Torch_FOUND)

message(STATUS "Pytorch status:")
message(STATUS "    libraries: ${TORCH_LIBRARIES}")

message(STATUS "OpenCV library status:")
message(STATUS "    version: ${OpenCV_VERSION}")
message(STATUS "    libraries: ${OpenCV_LIBS}")
message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")

add_executable(simnet test.cpp)
target_link_libraries(simnet ${TORCH_LIBRARIES} ${OpenCV_LIBS}) 
set_property(TARGET simnet PROPERTY CXX_STANDARD 11)

在Cmake配置后如果正确找到后会显示以下的信息：

-- Caffe2: CUDA detected: 9.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 9.2
-- Found cuDNN: v7.4.1  (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so)
-- Autodetected CUDA architecture(s): 6.1;6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
-- Pytorch status:
--     libraries: torch;caffe2_library;caffe2_gpu_library;/usr/lib/x86_64-linux-gnu/libcuda.so;/usr/local/cuda/lib64/libnvrtc.so;/usr/local/cuda/lib64/libnvToolsExt.so;/usr/local/cuda/lib64/libcudart_static.a;-lpthread;dl;/usr/lib/x86_64-linux-gnu/librt.so
-- OpenCV library status:
--     version: 4.0.0
--     libraries: opencv_calib3d;opencv_core;opencv_dnn;opencv_features2d;opencv_flann;opencv_gapi;opencv_highgui;opencv_imgcodecs;opencv_imgproc;opencv_ml;opencv_objdetect;opencv_photo;opencv_stitching;opencv_video;opencv_videoio
--     include path: /usr/local/include/opencv4
-- Configuring done
-- Generating done
-- Build files have been written to: /home/prototype/CLionProjects/simnet/cmake-build-release

然后我们的C++代码为：

#include <opencv2/opencv.hpp>
#include "torch/script.h"
#include "torch/torch.h"

#include <iostream>
#include <memory>

using namespace std;

// resize并保持图像比例不变
cv::Mat resize_with_ratio(cv::Mat& img)   
{
    cv::Mat temImage;
    int w = img.cols;
    int h = img.rows;

    float t = 1.;
    float len = t * std::max(w, h);
    int dst_w = 224, dst_h = 224;
    cv::Mat image = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(128,128,128));
    cv::Mat imageROI;
    if(len==w)
    {
        float ratio = (float)h/(float)w;
        cv::resize(img,temImage,cv::Size(224,224*ratio),0,0,cv::INTER_LINEAR);
        imageROI = image(cv::Rect(0, ((dst_h-224*ratio)/2), temImage.cols, temImage.rows));
        temImage.copyTo(imageROI);
    }
    else
    {
        float ratio = (float)w/(float)h;
        cv::resize(img,temImage,cv::Size(224*ratio,224),0,0,cv::INTER_LINEAR);
        imageROI = image(cv::Rect(((dst_w-224*ratio)/2), 0, temImage.cols, temImage.rows));
        temImage.copyTo(imageROI);
    }

    return image;
}


int main(int argc, const char* argv[])
{
    if (argc != 2) {
        std::cerr << "usage: example-app <path-to-exported-script-module>\n";
        return -1;
    }

    cv::VideoCapture stream(0);
    cv::namedWindow("Gesture Detect", cv::WINDOW_AUTOSIZE);

    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);
    module->to(at::kCUDA);

    cv::Mat frame;
    cv::Mat image;
    cv::Mat input;

    while(1)
    {
        stream>>frame;
        image = resize_with_ratio(frame);

        imshow("resized image",image);    //显示摄像头的数据
        cv::cvtColor(image, input, cv::COLOR_BGR2RGB);

        // 下方的代码即将图像转化为Tensor，随后导入模型进行预测
        torch::Tensor tensor_image = torch::from_blob(input.data, {1,input.rows, input.cols,3}, torch::kByte);
        tensor_image = tensor_image.permute({0,3,1,2});
        tensor_image = tensor_image.toType(torch::kFloat);
        tensor_image = tensor_image.div(255);
        tensor_image = tensor_image.to(torch::kCUDA);
        torch::Tensor result = module->forward({tensor_image}).toTensor();

        auto max_result = result.max(1, true);
        auto max_index = std::get<1>(max_result).item<float>();
        if(max_index == 0)
            cv::putText(frame, "paper", {40, 50}, cv::FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(0, 255, 0), 2);
        else if(max_index == 1)
            cv::putText(frame, "scissors", {40, 50}, cv::FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(0, 255, 0), 2);
        else
            cv::putText(frame, "stone", {40, 50}, cv::FONT_HERSHEY_PLAIN, 2.0, cv::Scalar(0, 255, 0), 2);

        imshow("Gesture Detect",frame);    //显示摄像头的数据
        cv::waitKey(30);
    }

然后在cmake时添加-DCMAKE_PREFIX_PATH=/path/to/pytorch/torch/lib/tmp_install引入libtorch路径。

这样我们的程序就可以运行了~

《利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测》

关于这个libtorch-C++的API的具体讲解，因为篇幅原因没有详细写出来，会在之后的文章中进行说明。

遇到的问题

上述的编译中可能会出现这个问题，或者其他出现一大堆命名定义但显示未定义的函数：

error: undefined reference to `cv::imread(std::string const&, int)'

如果你的OpenCV在单独编译使用时没有错误，但是一块编译就出现问题，那么这代表我们的libtorch库和OpenCV库冲突了，冲突原因可能是OpenCV编译OpenCV的C++-ABI版本和libtorch中的不同，所以建议OpenCV最好和libtorch在同样的环境下编译。

《利用Pytorch的C++前端(libtorch)读取预训练权重并进行预测》

当然还有有很多奇奇怪怪的原因，Pytorch中目前对C++的文档并不是很详细，也比较稀缺，但是可以在Pytorch论坛和github项目中查找相关问题或者提问。

Pytorch的C++端已经接近成熟，C++的预测相比Python端会稍微快一些，也减轻了安装Pytorch包的负担，未来等C++的APi稳定之后，我们可以直接利用torch.jit导出我们训练好的模型，在部署设备上，只需要一个lib库就可以利用GPU进行预测，这样生产效率会将会大大提高。

参考链接

https://blog.csdn.net/u012816621/article/details/51732932
https://michhar.github.io/how-i-built-pytorch-gpu/
https://github.com/tobiascz/MNIST_Pytorch_python_and_capi/blob/master/example-app.cpp
https://github.com/pytorch/pytorch/issues/14620
https://github.com/pytorch/pytorch/issues/14330
https://github.com/pytorch/pytorch/issues/12506
https://github.com/pytorch/pytorch/issues/13245#issuecomment-435165566
https://github.com/pytorch/pytorch/issues/13898#issuecomment-438657077

本篇文章采用署名-非商业性使用-禁止演绎 4.0 国际进行许可
转载请务必注明来源: https://oldpan.me/archives/pytorch-c-libtorch-inference

关注Oldpan博客微信公众号，你最需要的及时推送给你。

猜你喜欢

e
ekin说道：

2019年11月6日下午9:02

楼主尝试过pytorch移动端部署吗？有没有什么资料参考下啊！
1. O
  Oldpan说道：
  
  2019年11月8日上午9:55
  
  最近的Pytorch版本支持移动端了，不过比较成熟的流是Pytorch->ONNX->MNN/NCNN/TensorRT
  1. e
    ekin说道：
    
    2019年11月8日上午9:59
    
    嗯，我知道pytorch移动端支持了，但是具体怎么用，没找到相关资料。 pytorch->onnx->ncnn/mnn是大家移动端部署常用的，我就想尝试pytorch部署移动端踩踩坑而已
    1. O
      Oldpan说道：
      
      2019年11月8日上午11:14
      
      踩坑目前资料很少哦，得自己踩了，我个人用MNN/NCNN/TVM比较多，Pytorch原生的libtorch不是很看好，虽然我也用过一段时间
l
laoma说道：

2019年9月29日下午5:37

您好，我已经调试通了，是因为libtorch版本的问题。可是遇到了新的问题，分类准确率明显降低很多，相同的测试数据跟在python上面测试效果差多了。用的GPU。难道是模型转换的不对，或是调用使用方法的问题。
1. O
  Oldpan说道：
  
  2019年9月29日下午8:24
  
  很明显么？首先考虑下输入数据的问题，有没有归一化、数据范围对不对以及后处理对不对等等，其次再考虑模型的转化吧
2. A
  Aorcel说道：
  
  2019年11月19日下午7:32
  
  请问是怎么解决的啊？我也遇到了
l
laoma说道：

2019年9月27日下午12:08

您好，make的时候，遇到了问题error: conversion from ‘torch::jit::script::Module’ to non-scalar type ‘std::shared_ptr’ requested
请问您遇到过没，如何解决
1. 而
  而我想说道：
  
  2019年10月12日下午5:00
  
  你好，请问你怎么解决的这个问题？
s
syy说道：

2019年8月20日下午1:52

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument

RuntimeError: Only tensors and (possibly nested) tuples of tensors are supported as inputs or outputs of traced functions (toIValue at /pytorch/torch/csrc/jit/pybind_utils.h:91)
frame #0: std::function::operator()() const + 0x11 (0x7fdc18e73fe1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
e
ekin说道：

2019年8月8日下午7:46

libtorch能做反向传播吗？有没有相关例子呢
1. O
  Oldpan说道：
  
  2019年8月11日上午10:13
  
  可以,相关的例子官网有个小示例，不过不推荐用C++做训练代码，并不比python快多少。
  1. e
    ekin说道：
    
    2019年8月12日下午3:55
    
    我模型是在GPU上训练的，然后用CPU转成.pt模型，然后再CPU上部署，编译什么的都没问题，跑模型的时候出现 free(): invalid pointer, 大佬有碰到过吗？
    1. e
      ekin说道：
      
      2019年8月29日下午4:38
      
      这个问题换个编译器可以搞定， gcc4.8及以下会有问题，换到compiler>=4.9编译解决
  2. e
    ekin说道：
    
    2019年8月15日下午2:34
    
    怎么验证转过去的.pt模型对不对呢？有什么好方法么，我现在用c++加载部署.pt模型，发现跑出来的结果不对。就是跑stargan， github路径：https://github.com/eriklindernoren/PyTorch-GAN/tree/master/implementations/stargan
    1. l
      lxy说道：
      
      2019年8月22日上午11:56
      
      哈喽，我用c++验证".pt"模型的效果，也和python的不一致，你的bug解决了吗
      1. e
        ekin说道：
        
        2019年8月22日下午1:00
        
        解决了是转模型时候model.eval（）的问题
  3. e
    ekin说道：
    
    2019年8月29日下午2:16
    
    代码实例化了module = torch::jit::load("xxx.pt"), 然后module.forward(), 拿到结果后，不用释放吗？还是pytorch内部自己析构？
V
Vlues77说道：

2019年7月11日下午12:24

作者你好,请问如果要从源码编译安装时必须要编译pytorch和libtorch吗? 还是可以单独编译libtorch? 我试图只编译libtorhc, 运行python ../tools/build_libtorch.py 时报错了" Missing type check for 'at::Generator* generator' "
1. O
  Oldpan说道：
  
  2019年7月12日下午5:48
  
  可以单独编译libtorch，但是因为环境问题会有很多的坑，最好还是从官方github中的issue中找找解决办法。
  1. V
    Vlues77说道：
    
    2019年7月12日下午6:39
    
    已经编译了,谢谢, 确实遇到了很多坑
v
vincentqin说道：

2019年6月26日上午11:50

博主您好，我想问下libtorch的效率如何？
1. O
  Oldpan说道：
  
  2019年7月1日上午9:41
  
  比python来说块一丢丢，不过坑比较多点，用于生产稍微有点吃力
t
torch_slave说道：

2019年5月17日下午4:56

抱歉，回复之前的评论报ERR_CONTENT_DECODING_FAILED错，所以重新发起一个评论。就是用traced_script_module = torch.jit.trace(model, example, check_trace=False)导出模型的时候如果完全使用gpu0来操作，但是在c++里面通过to(device)的操作将模型和待预测的图片放到gpu1上，然后运行预测，就会报错，貌似这个时候的运算的中间结果还是会放到gpu0上面，而这时模型的权重我移动到了gpu1上面，导致不一致了。
1. O
  Oldpan说道：
  
  2019年5月17日下午10:18
  
  这种情况我没有遇到过，在我这边测试你可以直接CPU端导出模型，然后使用GPU读就可以，你那个可能是BUG
  1. t
    torch_slave说道：
    
    2019年5月18日上午8:39
    
    博主有没有遇到c++里面to(at::kCPU)或者to(at::kCUDA)时间过长的现象？module->forward操作只有几十毫秒，但是input的图片用to操作到gpu要几百毫秒，预测的ouput结果（与input同样大小的图片）用to操作到cpu要更长时间（~1.5s），搞不清楚为什么
    1. O
      Oldpan说道：
      
      2019年5月22日下午10:52
      
      不好意思哈，这两天比较忙，我明天测试下看看是什么原因。
t
torch_slave说道：

2019年5月16日下午12:05

感谢博主的分享，我这边用tracing的方式导出模型的时候发现一个问题，就是导出的模型在计算中间变量的时候貌似会关联tracing时的gpu_id，在c++里面用module->to(device_1);指定gpu无效（输入图tensor也已经通过tensor_image = tensor_image.set_requires_grad(false).to(device_1);放到了device1，device1是这样定义的torch::Device device_1(device_type, 1);），中间结果会在tracing的时候用的gpu上面，这样就会导致Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1 (while checking arguments for cudnn_convolution)这个错误的出现，这个是libtorch的bug还是我哪里代码写的不对？
1. O
  Oldpan说道：
  
  2019年5月17日下午2:39
  
  没太明白你的意思，你是指导出模型权重的时候device是GPU0，然后用libtorch读取的时候是放入GPU1，这样出现的错误吗？
Z
Zhotaru说道：

2019年4月29日下午2:40

博主大大你好，我在pytorch上设计并训练了模型想在c++下部署。奇怪的是在c++上网络前传用的时间比在Python下更长了，不知道这是为啥....希望能得到您帮助
1. O
  Oldpan说道：
  
  2019年4月30日下午5:41
  
  不应该呀，你的libtorch是自己编译的还是官网，运行的环境都一样吗，第三库都使用了不？
  1. Z
    Zhotaru说道：
    
    2019年4月30日下午5:44
    
    libtorch是官网的包不是自己编译的，我在官方的GitHub上看很多人有一样的问题也都解决不了....感觉c++下只能自己另找办法了
    1. O
      Oldpan说道：
      
      2019年4月30日下午5:48
      
      可能是由BUG或者其他，这个需要自己去找一下了，不过C++下运行你可以尝试尝试TVM，或许会加速你的网络。
2. 阿
  阿国说道：
  
  2020年3月3日上午10:26
  
  前辈你好，我在c++部署时也遇到前向传播时间过长的问题，请问你是怎么解决的呢？能把Github这个问题的链接发给我看看吗？谢谢
y
yaodao说道：

2019年3月27日下午4:24

您好，我用Visual studio调用Libtorch，但是会报getenv的错误。想请问一下有没有什么解决办法。。谢谢您！网上有关于getenv的方法，但是好像都没用。。

错误 C4996 'getenv': This function or variable may be unsafe. Consider using _dupenv_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.libtorch\include\aten\context.h 164
1. O
  Oldpan说道：
  
  2019年3月27日下午9:31
  
  不好意思，这个问题我没有遇到过，你是使用了cmake吗
2. a
  asd说道：
  
  2019年4月16日下午3:33
  
  老弟，定义一个宏就搞定了 _CRT_SECURE_NO_WARNINGS
月
月影说道：

2019年3月21日下午4:06

博主您好，在pytorch中：out = out .data.cpu().numpy().transpose(1, 2, 0) 的这个运算，在libtorch中：auto out = out_tensor.data();之后，发现out是一个指针，不知道怎么进行cpu().numpy().transpose(1, 2, 0) 的运算，望指教，非常感谢！
1. O
  Oldpan说道：
  
  2019年3月21日下午11:29
  
  不是很好弄，因为data()后返回的是,指向原数据地址然后static_cast后T类型的指针，比较难转置，最好还是执行data()前转置吧，有相关函数的。
i
inhane说道：

2019年3月12日下午8:08

博主你好，请问你知道如何将at:tensor类型转化成const float*类型吗？类似与caffe的 ->cpu_data()
Blob* net_output_blob = net_->blob_by_name("net_output").get();
const float* net_output_data_begin = net_output_blob->cpu_data();
1. O
  Oldpan说道：
  
  2019年3月13日下午9:30
  
  我看到你提了个Issue，就是.data(),这两天有其他事儿没看到，如有还有问题可以一起交流~
  1. i
    inhane说道：
    
    2019年3月13日下午9:41
    
    嗯嗯
S
Sierkinhane说道：

2019年3月9日下午3:50

出现cuda unknown error 是怎么回事，我的环境是可以用gpu加速的，不知道为什么会出这个错误
1. O
  Oldpan说道：
  
  2019年3月10日上午10:08
  
  信息太少无法确定什么问题，最好确定CUDA存在且在这个CUDA环境下编译Libtorch
@
@青hdhdk说道：

2019年2月22日下午4:30

楼主，今天看了您的博客，学到了很多，感谢楼主的奉献精神。顺便请教一下。我的pytorch网络是多任务分支输出的，也就是输出结果是多个结果矩阵。请问如何利用module->forward(inputs)输出分别得到多个矩阵数据？
1. O
  Oldpan说道：
  
  2019年2月22日下午10:39
  
  多输出模型没用过了，不知道具体情况，没法跟你说。不过模型在python中多输入的表示为tuple,C++中为std::vector,输出情况应该也一样。
  1. 青
    青hdhdk说道：
    
    2019年3月6日下午5:10
    
    谢谢博主的回答。我的问题已解决，不过发现一个尴尬的问题，用c++进行forward预测耗时是用python的2倍。。。
    1. O
      Oldpan说道：
      
      2019年3月6日下午7:10
      
      这个不应该，难道你的模型比较复杂？
      1. 青
        青hdhdk说道：
        
        2019年3月6日下午9:43
        
        我的是多任务输出的一个网络，不算太复杂。同样的图像输入在titan x上forward过程用python耗时平均7ms，用libtorch c++平均耗时70ms。要嵌入式移植，怕是只能另寻他路了。
    2. O
      Oldpan说道：
      
      2019年3月6日下午10:19
      
      嵌入式移植，带GPU跑还是用CPU跑？可以考虑考虑TVM，我这些天也在研究
      1. 青
        青hdhdk说道：
        
        2019年3月7日上午11:36
        
        带GPU跑，楼主有关于TVM比较好的介绍资料吗？对了，另外楼主是否测试过从tensor中取值耗时？我从libtorch输出得到的一个概率矩阵Tensor-- cls_softmax，取Tensor中的值，耗时较长。例如： cls_softmax是 at::Tensor* 的指针常量。从Tensor中取值 auto prob = (*cls_softmax)[0][1][h][w].item(); 在我的cpu上耗时大约0.006ms。考虑到循环取值，这个时间会比较长。以前我在数组矩阵中，采用数组元素地址指针取值的方式，几乎是没有耗时的。所以不知道是我使用libtorch方式不对，还是libtorch的c++运行效率还有待优化？
    3. O
      Oldpan说道：
      
      2019年3月7日下午4:01
      
      没有测试过耗时，我一般取值用 T * data() const 模板接口，C++取值应该不可能比python慢，我不大了解你的情况。TVM也支持GPU，不过不知道你那个嵌入式的型号在不在其列表。
Pingback： Pytorch的C++端(libtorch)在Windows中的使用 - Oldpan的个人博客
老
老梅西说道：

2019年1月28日下午12:05

楼主你好，我遇到了你说的 File is an unsupported archive format 问题，请问如何解决呀？
1. O
  Oldpan说道：
  
  2019年1月29日上午11:13
  
  文中我已经说过了，使用版本一样的Pytorch和libtorch就可以解决问题
  1. 老
    老梅西说道：
    
    2019年1月31日上午7:35
    
    thanks, 我的Pytorch 是1.0.0 的，怎么知道需要哪个版本的libtorch 呀
    1. O
      Oldpan说道：
      
      2019年1月31日上午10:08
      
      要不都从官网下stable版的，要不都下载同一份源码编译出Pytorch和libtorch。
w
winafox说道：

2019年1月7日下午3:25

帅哥，我遇到cv::imread 未定义引用问题，实在是解决不了，最后我重新编译opencv，并加入D_GLIBCXX_USE_CXX11_ABI=0，这样才解决的，真是折腾啊。不过还是非常感谢你。
1. O
  Oldpan说道：
  
  2019年1月8日下午3:22
  
  嗯嗯，不用谢，可以多交流~
  1. w
    winafox说道：
    
    2019年1月8日下午5:26
    
    顺便问一下，你这里使用单张图片作为input，那我想一次性很多张图片放进forward，然后获得每张图的运算结果，我该如何处理呢？
    我现在是torch::cat((img,img1,img2,img3),0)，但是运算结束后，我不知道如何获取img1,img2,img3的结果了，你知道怎么弄吗？
    1. O
      Oldpan说道：
      
      2019年1月8日下午8:14
      
      直接最后tensor里头，tensor[0]索引就行了
      1. w
        winafox说道：
        
        2019年1月9日下午12:09
        
        试了，不行，直接crash了，错误是 index 1 out of range of tensor of size [1,5] at dimension 0. 我这里是5个类别。
        所以看起来，我放了一个batch的image进去forward，但是出来的却只有一个图片的结果，好奇怪。
        我的input是这样 torch::cat((img1,img2),0)，不知道这样有什么问题没有。
    2. O
      Oldpan说道：
      
      2019年1月9日下午2:30
      
      你要确认下：
      1、你使用的模型中用于track的tensor和你实际输入的维度一样
      2、不要用torch.cat，使用std::vector torch::jit::IValue inputs; 然后push_back你要输入的tensor
      1. w
        winafox说道：
        
        2019年1月9日下午2:55
        
        嗯~~我试了一下，用inputs.push_back，在forward的时候，就提示说 Expected at most 1 argument(s) for operator 'forward', but received 2 argument(s). 这样看起来，forward难道不接受batch？不至于吧。
    3. O
      Oldpan说道：
      
      2019年1月9日下午3:06
      
      这样 at::Tensor input1 = torch::ones({1,3,128,128}); at::Tensor input2 = torch::ones({1,3,128,128}); at::Tensor inputs = torch::cat({input1,input2},0); at::Tensor output = module->forward({inputs}).toTensor(); 注意你的model.pt在python端导出的输入维数也是(2,3,128,128)
      1. w
        winafox说道：
        
        2019年1月9日下午3:56
        
        按照你的说法，python端，我设置数据input的维度是[2,3,224,224]，然后顺利导出了。在c++端，torch::cat了两个[1,3,224,224]的tensor，然后forward，但是forward后的tensor的size，还是 [1,5]，并没有多维度。所以还是只有一个值，哎，快哭了。
        BTW, 我发现个有趣的现象，cat(img1,img2)后的tensor，进行forward，得到的结果是img2的结果，并没有img1的结果，所以是不是cat这个函数出了问题？
    4. O
      Oldpan说道：
      
      2019年1月9日下午4:02
      
      奇怪了，在我这边是对着了
      1. w
        winafox说道：
        
        2019年1月9日下午5:06
        
        哥们儿我找到原因了，我cat里面符号用错了。正确的是cat({img1,img2},0)，而我写成了cat((img1,img2),0)，“这个 () ”让我折腾了好久，太感谢你了。
        顺便说一下，python端，并不需要把input设置成多维度，只要是[1,3,224,224]即可。
作
作者你好啊说道：

2018年12月25日下午3:11

作者你好，model.pt我重新生成了。并且main()成功读取了。编译成功后运行./example-app model.pt，显示
ok
但在main（）添加后两段语句
// Create a vector of inputs.
std::vector inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));

// Execute the model and turn its output into a tensor.
at::Tensor output = module->forward(inputs).toTensor();

std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';
后编译通过了，但运行./example-app model.pt时报错，
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Aborted (core dumped)
不知作者遇到该问题没有，怎么解决的。谢谢
1. O
  Oldpan说道：
  
  2018年12月25日下午7:10
  
  没遇到过，可能跟你的编译环境有关系吧，换一个试试
  1. k
    keepstudy说道：
    
    2019年11月22日下午7:19
    
    terminate called after throwing an instance of 'c10::Error'
    what(): Must not create a new variable from a variable, use its .data() (make_variable at /home/tb/Downloads/libtorch/include/torch/csrc/autograd/variable.h:577)
    frame #0: std::function::operator()() const + 0x11 (0x7fe7c3e8d441 in /home/tb/Downloads/libtorch/lib/libc10.so)
    frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fe7c3e8cd7a in /home/tb/Downloads/libtorch/lib/libc10.so)
    frame #2: torch::autograd::make_variable(at::Tensor, bool, bool) + 0xa8 (0x456969 in ./predict-app)
    frame #3: main + 0x72e (0x4537c8 in ./predict-app)
    frame #4: __libc_start_main + 0xf0 (0x7fe75cbef830 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #5: _start + 0x29 (0x452289 in ./predict-app)
    
    Aborted
    博主你好,向你上面提到过得,怎样解决这个问题呢?尝试了很多方法都没有成功,谢谢啦
    1. O
      Oldpan说道：
      
      2019年11月25日下午5:09
      
      是数据类型返回错了，使用变量应该调用.data()
作
作者你好说道：

2018年12月24日下午2:29

在“简单测试libtorch是否正常工作”报错

from Models.MobileNetv2 import mobilenetv2
ModuleNotFoundError: No module named 'Models'

这个mobilenetv2第三方库需要安装吗，在pycharm库里搜不到，具体怎么安装，还是不需要。十分期待回复
1. O
  Oldpan说道：
  
  2018年12月24日下午2:59
  
  这个mobilenetv2是自己写的模型，你没有必要使用和我一样的，可以直接调用torchvision中的现有模型测试即可

1 2 下一页