点击小眼睛开启蜘蛛网特效

终于把TensorRT的engine模型的结构图画出来了!

《终于把TensorRT的engine模型的结构图画出来了!》

终于把TensorRT的engine模型的结构图画出来了!

大概长这样(截取了最终模型图的输入部分),仔细看看:

《终于把TensorRT的engine模型的结构图画出来了!》

可以看到很多层被融合了,比如conv1.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 + Relu_11这个部分。也有没有被融合的,比如MaxPool_12。另外QuantizeLinear这个量化算子,可能有些童鞋没有见过,大家可以把它当做一个层就可以。

可以看到上面这个模型输入是Float而输出是Int8。这个模型是由TensorRT官方提供的pytorch-quantization工具对Pytorch模型进行量化后导出ONNX,然后再由TensorRT-8转化得到的engine,这个engine的精度是INT8。

PS:关于TensorRT的量化细节,老潘后续文章会陆续讲,不着急哈。

TensorRT的优化

众所周知,TensorRT会对模型做很多的优化,比如前后层融合(CONV+BN+RELU)、比如水平层融合、又比如去掉concat直接操作等等:

《终于把TensorRT的engine模型的结构图画出来了!》

更多的细节可以看我之前的文章《内卷成啥了还不知道TensorRT?超详细入门指北,来看看吧!》回忆一下。

总之,通过TensorRT优化后的模型,基本已经“面目全非”了,TensorRT支持很多层的融合,你的模型扔给TensorRT再出来,会发现很多层都被合体了。当然这样做的目的是为了优化访存,减少数据在每层之间传输的消耗。

不过,这样做并不都没毛病,有时候会有奇奇怪怪的BUG。我们需要注意。

《终于把TensorRT的engine模型的结构图画出来了!》

被合体之后的模型,我们一般无法通过Netron来读取查看,毕竟TensorRT是闭源的,其生成的engine结构之复杂,只靠猜是不行的。不过TensorRT知道其闭源的缺点,为我们引入了log接口,如果我们想看到融合后的模型长什么样,只要在build engine开启verbose模式即可。

Verbose查看engine结构

很简单,拿TensorRT的官方工具trtexec为例,我们只需要在构建的时候添加verbose命令:

./trtexec --explicitBatch --onnx=debug.onnx --saveEngine=debug.trt  --verbose

即可在转换的时候得到大量的转换信息,例如build信息,我们可以看到这个模型的构建精度是FP32+INT8:

[08/25/2021-17:30:04] [I] === Build Options ===
[08/25/2021-17:30:04] [I] Max batch: explicit
[08/25/2021-17:30:04] [I] Workspace: 4096 MiB
[08/25/2021-17:30:04] [I] minTiming: 1
[08/25/2021-17:30:04] [I] avgTiming: 8
[08/25/2021-17:30:04] [I] Precision: FP32+INT8
[08/25/2021-17:30:04] [I] Calibration: Dynamic
[08/25/2021-17:30:04] [I] Refit: Disabled
[08/25/2021-17:30:04] [I] Sparsity: Disabled
[08/25/2021-17:30:04] [I] Safe mode: Disabled
[08/25/2021-17:30:04] [I] Restricted mode: Disabled
[08/25/2021-17:30:04] [I] Save engine: debug_int8.trt

在经过漫长且深奥的一堆优化步骤之后,我们可以看到最终模型的engine结构:

[V] [TRT] Engine Layer Information:
Layer(Scale): QuantizeLinear_2_quantize_scale_node, Tactic: 0, input[Float(1,3,-17,-18)] -> 255[Int8(1,3,-17,-18)]
Layer(CaskConvolution): conv1.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 + Relu_11, Tactic: 4438325421691896755, 255[Int8(1,3,-17,-18)] -> 267[Int8(1,64,-40,-44)]
Layer(CudaPooling): MaxPool_12, Tactic: -3, 267[Int8(1,64,-40,-44)] -> Reformatted Output Tensor 0 to MaxPool_12[Int8(1,64,-21,-24)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to MaxPool_12, Tactic: 0, Reformatted Output Tensor 0 to MaxPool_12[Int8(1,64,-21,-24)] -> 270[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.0.conv1.weight + QuantizeLinear_20_quantize_scale_node + Conv_22 + Relu_24, Tactic: 4871133328510103657, 270[Int8(1,64,-21,-24)] -> 284[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.0.conv2.weight + QuantizeLinear_32_quantize_scale_node + Conv_34 + Add_42 + Relu_43, Tactic: 4871133328510103657, 284[Int8(1,64,-21,-24)], 270[Int8(1,64,-21,-24)] -> 305[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.1.conv1.weight + QuantizeLinear_51_quantize_scale_node + Conv_53 + Relu_55, Tactic: 4871133328510103657, 305[Int8(1,64,-21,-24)] -> 319[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.1.conv2.weight + QuantizeLinear_63_quantize_scale_node + Conv_65 + Add_73 + Relu_74, Tactic: 4871133328510103657, 319[Int8(1,64,-21,-24)], 305[Int8(1,64,-21,-24)] -> 340[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.2.conv1.weight + QuantizeLinear_82_quantize_scale_node + Conv_84 + Relu_86, Tactic: 4871133328510103657, 340[Int8(1,64,-21,-24)] -> 354[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.2.conv2.weight + QuantizeLinear_94_quantize_scale_node + Conv_96 + Add_104 + Relu_105, Tactic: 4871133328510103657, 354[Int8(1,64,-21,-24)], 340[Int8(1,64,-21,-24)] -> 375[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer2.0.conv1.weight + QuantizeLinear_113_quantize_scale_node + Conv_115 + Relu_117, Tactic: -1841683966837205309, 375[Int8(1,64,-21,-24)] -> 389[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.0.downsample.0.weight + QuantizeLinear_136_quantize_scale_node + Conv_138, Tactic: -1494157908358500249, 375[Int8(1,64,-21,-24)] -> 415[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.0.conv2.weight + QuantizeLinear_125_quantize_scale_node + Conv_127 + Add_146 + Relu_147, Tactic: -1841683966837205309, 389[Int8(1,128,-52,-37)], 415[Int8(1,128,-52,-37)] -> 423[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.1.conv1.weight + QuantizeLinear_155_quantize_scale_node + Conv_157 + Relu_159, Tactic: -1841683966837205309, 423[Int8(1,128,-52,-37)] -> 437[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.1.conv2.weight + QuantizeLinear_167_quantize_scale_node + Conv_169 + Add_177 + Relu_178, Tactic: -1841683966837205309, 437[Int8(1,128,-52,-37)], 423[Int8(1,128,-52,-37)] -> 458[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.2.conv1.weight + QuantizeLinear_186_quantize_scale_node + Conv_188 + Relu_190, Tactic: -1841683966837205309, 458[Int8(1,128,-52,-37)] -> 472[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.2.conv2.weight + QuantizeLinear_198_quantize_scale_node + Conv_200 + Add_208 + Relu_209, Tactic: -1841683966837205309, 472[Int8(1,128,-52,-37)], 458[Int8(1,128,-52,-37)] -> 493[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.3.conv1.weight + QuantizeLinear_217_quantize_scale_node + Conv_219 + Relu_221, Tactic: -1841683966837205309, 493[Int8(1,128,-52,-37)] -> 507[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.3.conv2.weight + QuantizeLinear_229_quantize_scale_node + Conv_231 + Add_239 + Relu_240, Tactic: -1841683966837205309, 507[Int8(1,128,-52,-37)], 493[Int8(1,128,-52,-37)] -> 528[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer3.0.conv1.weight + QuantizeLinear_248_quantize_scale_node + Conv_250 + Relu_252, Tactic: -8431788508843860955, 528[Int8(1,128,-52,-37)] -> 542[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.0.downsample.0.weight + QuantizeLinear_271_quantize_scale_node + Conv_273, Tactic: -5697614955743334137, 528[Int8(1,128,-52,-37)] -> 568[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.0.conv2.weight + QuantizeLinear_260_quantize_scale_node + Conv_262 + Add_281 + Relu_282, Tactic: -496455309852654971, 542[Int8(1,256,-59,-62)], 568[Int8(1,256,-59,-62)] -> 576[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.1.conv1.weight + QuantizeLinear_290_quantize_scale_node + Conv_292 + Relu_294, Tactic: -8431788508843860955, 576[Int8(1,256,-59,-62)] -> 590[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.1.conv2.weight + QuantizeLinear_302_quantize_scale_node + Conv_304 + Add_312 + Relu_313, Tactic: -496455309852654971, 590[Int8(1,256,-59,-62)], 576[Int8(1,256,-59,-62)] -> 611[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.2.conv1.weight + QuantizeLinear_321_quantize_scale_node + Conv_323 + Relu_325, Tactic: -8431788508843860955, 611[Int8(1,256,-59,-62)] -> 625[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.2.conv2.weight + QuantizeLinear_333_quantize_scale_node + Conv_335 + Add_343 + Relu_344, Tactic: -496455309852654971, 625[Int8(1,256,-59,-62)], 611[Int8(1,256,-59,-62)] -> 646[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.3.conv1.weight + QuantizeLinear_352_quantize_scale_node + Conv_354 + Relu_356, Tactic: -8431788508843860955, 646[Int8(1,256,-59,-62)] -> 660[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.3.conv2.weight + QuantizeLinear_364_quantize_scale_node + Conv_366 + Add_374 + Relu_375, Tactic: -496455309852654971, 660[Int8(1,256,-59,-62)], 646[Int8(1,256,-59,-62)] -> 681[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.4.conv1.weight + QuantizeLinear_383_quantize_scale_node + Conv_385 + Relu_387, Tactic: -8431788508843860955, 681[Int8(1,256,-59,-62)] -> 695[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.4.conv2.weight + QuantizeLinear_395_quantize_scale_node + Conv_397 + Add_405 + Relu_406, Tactic: -496455309852654971, 695[Int8(1,256,-59,-62)], 681[Int8(1,256,-59,-62)] -> 716[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.5.conv1.weight + QuantizeLinear_414_quantize_scale_node + Conv_416 + Relu_418, Tactic: -8431788508843860955, 716[Int8(1,256,-59,-62)] -> 730[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.5.conv2.weight + QuantizeLinear_426_quantize_scale_node + Conv_428 + Add_436 + Relu_437, Tactic: -496455309852654971, 730[Int8(1,256,-59,-62)], 716[Int8(1,256,-59,-62)] -> 751[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer4.0.conv1.weight + QuantizeLinear_445_quantize_scale_node + Conv_447 + Relu_449, Tactic: -6371781333659293809, 751[Int8(1,256,-59,-62)] -> 765[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.0.downsample.0.weight + QuantizeLinear_468_quantize_scale_node + Conv_470, Tactic: -1494157908358500249, 751[Int8(1,256,-59,-62)] -> 791[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.0.conv2.weight + QuantizeLinear_457_quantize_scale_node + Conv_459 + Add_478 + Relu_479, Tactic: -2328318099174473157, 765[Int8(1,512,-71,-72)], 791[Int8(1,512,-71,-72)] -> 799[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.1.conv1.weight + QuantizeLinear_487_quantize_scale_node + Conv_489 + Relu_491, Tactic: -2328318099174473157, 799[Int8(1,512,-71,-72)] -> 813[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.1.conv2.weight + QuantizeLinear_499_quantize_scale_node + Conv_501 + Add_509 + Relu_510, Tactic: -2328318099174473157, 813[Int8(1,512,-71,-72)], 799[Int8(1,512,-71,-72)] -> 834[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.2.conv1.weight + QuantizeLinear_518_quantize_scale_node + Conv_520 + Relu_522, Tactic: -2328318099174473157, 834[Int8(1,512,-71,-72)] -> 848[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.2.conv2.weight + QuantizeLinear_530_quantize_scale_node + Conv_532 + Add_540 + Relu_541, Tactic: -2328318099174473157, 848[Int8(1,512,-71,-72)], 834[Int8(1,512,-71,-72)] -> 869[Int8(1,512,-71,-72)]
Layer(CaskDeconvolution): deconv_layers.0.weight + QuantizeLinear_549_quantize_scale_node + ConvTranspose_551, Tactic: -3784829056659735491, 869[Int8(1,512,-71,-72)] -> 881[Int8(1,512,-46,-47)]
Layer(CaskConvolution): deconv_layers.1.weight + QuantizeLinear_559_quantize_scale_node + Conv_561 + Relu_563, Tactic: -496455309852654971, 881[Int8(1,512,-46,-47)] -> 895[Int8(1,256,-46,-47)]
Layer(CaskDeconvolution): deconv_layers.4.weight + QuantizeLinear_571_quantize_scale_node + ConvTranspose_573, Tactic: -3784829056659735491, 895[Int8(1,256,-46,-47)] -> 907[Int8(1,256,-68,-55)]
Layer(CaskConvolution): deconv_layers.5.weight + QuantizeLinear_581_quantize_scale_node + Conv_583 + Relu_585, Tactic: -8431788508843860955, 907[Int8(1,256,-68,-55)] -> 921[Int8(1,256,-68,-55)]
Layer(CaskDeconvolution): deconv_layers.8.weight + QuantizeLinear_593_quantize_scale_node + ConvTranspose_595, Tactic: -2621193268472024213, 921[Int8(1,256,-68,-55)] -> 933[Int8(1,256,-29,-32)]
Layer(CaskConvolution): deconv_layers.9.weight + QuantizeLinear_603_quantize_scale_node + Conv_605 + Relu_607, Tactic: -8431788508843860955, 933[Int8(1,256,-29,-32)] -> 947[Int8(1,256,-29,-32)]
Layer(CaskConvolution): hm.0.weight + QuantizeLinear_615_quantize_scale_node + Conv_617 + Relu_618, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 960[Int8(1,64,-29,-32)]
Layer(CaskConvolution): wh.0.weight + QuantizeLinear_636_quantize_scale_node + Conv_638 + Relu_639, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 985[Int8(1,64,-29,-32)]
Layer(CaskConvolution): reg.0.weight + QuantizeLinear_657_quantize_scale_node + Conv_659 + Relu_660, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 1010[Int8(1,64,-29,-32)]
Layer(CaskConvolution): hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628, Tactic: -7185527339793611699, 960[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628, Tactic: 0, Reformatted Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628[Float(1,2,-29,-32)] -> hm[Float(1,2,-29,-32)]
Layer(CaskConvolution): wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649, Tactic: -7185527339793611699, 985[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649, Tactic: 0, Reformatted Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649[Float(1,2,-29,-32)] -> wh[Float(1,2,-29,-32)]
Layer(CaskConvolution): reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670, Tactic: -7185527339793611699, 1010[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670, Tactic: 0, Reformatted Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670[Float(1,2,-29,-32)] -> reg[Float(1,2,-29,-32)]

大家能猜到以上模型的backbone是什么吗?

不画图根本看不出来好不。

由于老潘这几天翻Pytorch的Pull Request记录比较勤快,无意间发现了一个好东西——engine_layer_visualize.py,其commit在这里:

这是jerryzh168大神开源了Facebook内部查看engine的工具,使用pydot和graphviz来画神经网络结构图,查了一下之前Keras竟然也是使用这个库来画图的。

使用Pydot和graphviz画TensorRT的Engine图

使用方式很简单,首先安装:

pip install pydot
conda install python-graphviz

PS:别问我为什么先pip installconda install,我这边只有这样才不报错…否则会报[Errno 2] "dot" not found in path.

然后利用以下代码:

# (c) Facebook, Inc. and its affiliates. Confidential and proprietary.

import argparse
import re
from typing import NamedTuple, List, Optional

import pydot


"""
log_file is generated by tensorrt verbose logger during building engine.
profile_file is generated by tensorrt profiler.

Curretnly we support processing multiple logs in one log_file, which
would generate multiple dot graphs. However, multiple engine profiles are not
supported.

Usage:
    python torch/fx/experimental/fx2trt/tools/engine_layer_visualize.py --log_file aaa --profile_file bbb

Usage(Facebook):
    buck run //caffe2/torch/fx/experimental/fx2trt/tools:engine_layer_visualize -- --log_file aaa --profile_file bbb
"""


parser = argparse.ArgumentParser()
parser.add_argument(
    "--log_file",
    type=str,
    default="",
    help="TensorRT VERBOSE logging when building engines.",
)
parser.add_argument(
    "--profile_file",
    type=str,
    default="",
    help="TensorRT execution context profiler output.",
)
args = parser.parse_args()

...

完整代码在这里: https://github.com/pytorch/pytorch/pull/66431/files, 这里就不粘了。

需要注意我们需要输入log_file也就是刚才开启Verbose的构建信息,然后profile_file则是使用TensorRT来profile的信息,最简单的可以通过trtexec这样获取到:

./trtexec --loadEngine=debug_int8.trt --dumpProfile --shapes=input:1x3x512x512 --exportProfile=debug_profile

然后会产生类似于这样的profile信息,详细展示了融合后每层的平均运行时间、以及总体运行时间、时间占比:

[
  { "count" : 961 }
, { "name" : "QuantizeLinear_2_quantize_scale_node", "timeMs" : 19.9954, "averageMs" : 0.0208069, "percentage" : 0.801597 }
, { "name" : "conv1.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 + Relu_11", "timeMs" : 86.6105, "averageMs" : 0.0901253, "percentage" : 3.47213 }
, { "name" : "MaxPool_12", "timeMs" : 28.0466, "averageMs" : 0.0291848, "percentage" : 1.12436 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to MaxPool_12", "timeMs" : 12.9771, "averageMs" : 0.0135037, "percentage" : 0.520239 }
, { "name" : "layer1.0.conv1.weight + QuantizeLinear_20_quantize_scale_node + Conv_22 + Relu_24", "timeMs" : 28.8356, "averageMs" : 0.0300059, "percentage" : 1.15599 }
, { "name" : "layer1.0.conv2.weight + QuantizeLinear_32_quantize_scale_node + Conv_34 + Add_42 + Relu_43", "timeMs" : 31.3897, "averageMs" : 0.0326635, "percentage" : 1.25838 }
, { "name" : "layer1.1.conv1.weight + QuantizeLinear_51_quantize_scale_node + Conv_53 + Relu_55", "timeMs" : 28.788, "averageMs" : 0.0299563, "percentage" : 1.15408 }
, { "name" : "layer1.1.conv2.weight + QuantizeLinear_63_quantize_scale_node + Conv_65 + Add_73 + Relu_74", "timeMs" : 31.1857, "averageMs" : 0.0324513, "percentage" : 1.25021 }
, { "name" : "layer1.2.conv1.weight + QuantizeLinear_82_quantize_scale_node + Conv_84 + Relu_86", "timeMs" : 28.7898, "averageMs" : 0.0299581, "percentage" : 1.15415 }
, { "name" : "layer1.2.conv2.weight + QuantizeLinear_94_quantize_scale_node + Conv_96 + Add_104 + Relu_105", "timeMs" : 31.1666, "averageMs" : 0.0324314, "percentage" : 1.24944 }
, { "name" : "layer2.0.conv1.weight + QuantizeLinear_113_quantize_scale_node + Conv_115 + Relu_117", "timeMs" : 20.9996, "averageMs" : 0.0218519, "percentage" : 0.841856 }
, { "name" : "layer2.0.downsample.0.weight + QuantizeLinear_136_quantize_scale_node + Conv_138", "timeMs" : 10.1555, "averageMs" : 0.0105677, "percentage" : 0.407126 }
, { "name" : "layer2.0.conv2.weight + QuantizeLinear_125_quantize_scale_node + Conv_127 + Add_146 + Relu_147", "timeMs" : 31.8969, "averageMs" : 0.0331914, "percentage" : 1.27872 }
, { "name" : "layer2.1.conv1.weight + QuantizeLinear_155_quantize_scale_node + Conv_157 + Relu_159", "timeMs" : 30.5402, "averageMs" : 0.0317796, "percentage" : 1.22433 }
, { "name" : "layer2.1.conv2.weight + QuantizeLinear_167_quantize_scale_node + Conv_169 + Add_177 + Relu_178", "timeMs" : 32.0256, "averageMs" : 0.0333253, "percentage" : 1.28388 }
, { "name" : "layer2.2.conv1.weight + QuantizeLinear_186_quantize_scale_node + Conv_188 + Relu_190", "timeMs" : 30.5798, "averageMs" : 0.0318208, "percentage" : 1.22591 }
, { "name" : "layer2.2.conv2.weight + QuantizeLinear_198_quantize_scale_node + Conv_200 + Add_208 + Relu_209", "timeMs" : 31.813, "averageMs" : 0.0331041, "percentage" : 1.27536 }
, { "name" : "layer2.3.conv1.weight + QuantizeLinear_217_quantize_scale_node + Conv_219 + Relu_221", "timeMs" : 30.6143, "averageMs" : 0.0318568, "percentage" : 1.2273 }
, { "name" : "layer2.3.conv2.weight + QuantizeLinear_229_quantize_scale_node + Conv_231 + Add_239 + Relu_240", "timeMs" : 32.123, "averageMs" : 0.0334266, "percentage" : 1.28778 }
, { "name" : "layer3.0.conv1.weight + QuantizeLinear_248_quantize_scale_node + Conv_250 + Relu_252", "timeMs" : 21.1744, "averageMs" : 0.0220337, "percentage" : 0.848863 }
, { "name" : "layer3.0.downsample.0.weight + QuantizeLinear_271_quantize_scale_node + Conv_273", "timeMs" : 12.0922, "averageMs" : 0.0125829, "percentage" : 0.484765 }
, { "name" : "layer3.0.conv2.weight + QuantizeLinear_260_quantize_scale_node + Conv_262 + Add_281 + Relu_282", "timeMs" : 34.8428, "averageMs" : 0.0362568, "percentage" : 1.39682 }
, { "name" : "layer3.1.conv1.weight + QuantizeLinear_290_quantize_scale_node + Conv_292 + Relu_294", "timeMs" : 31.9807, "averageMs" : 0.0332785, "percentage" : 1.28207 }
, { "name" : "layer3.1.conv2.weight + QuantizeLinear_302_quantize_scale_node + Conv_304 + Add_312 + Relu_313", "timeMs" : 34.4399, "averageMs" : 0.0358375, "percentage" : 1.38066 }
, { "name" : "layer3.2.conv1.weight + QuantizeLinear_321_quantize_scale_node + Conv_323 + Relu_325", "timeMs" : 31.7602, "averageMs" : 0.0330491, "percentage" : 1.27324 }
, { "name" : "layer3.2.conv2.weight + QuantizeLinear_333_quantize_scale_node + Conv_335 + Add_343 + Relu_344", "timeMs" : 35.1158, "averageMs" : 0.0365409, "percentage" : 1.40776 }
, { "name" : "layer3.3.conv1.weight + QuantizeLinear_352_quantize_scale_node + Conv_354 + Relu_356", "timeMs" : 32.027, "averageMs" : 0.0333267, "percentage" : 1.28393 }
, { "name" : "layer3.3.conv2.weight + QuantizeLinear_364_quantize_scale_node + Conv_366 + Add_374 + Relu_375", "timeMs" : 34.6465, "averageMs" : 0.0360526, "percentage" : 1.38895 }
, { "name" : "layer3.4.conv1.weight + QuantizeLinear_383_quantize_scale_node + Conv_385 + Relu_387", "timeMs" : 31.7624, "averageMs" : 0.0330514, "percentage" : 1.27332 }
, { "name" : "layer3.4.conv2.weight + QuantizeLinear_395_quantize_scale_node + Conv_397 + Add_405 + Relu_406", "timeMs" : 34.3392, "averageMs" : 0.0357328, "percentage" : 1.37663 }
, { "name" : "layer3.5.conv1.weight + QuantizeLinear_414_quantize_scale_node + Conv_416 + Relu_418", "timeMs" : 31.728, "averageMs" : 0.0330156, "percentage" : 1.27195 }
, { "name" : "layer3.5.conv2.weight + QuantizeLinear_426_quantize_scale_node + Conv_428 + Add_436 + Relu_437", "timeMs" : 34.2101, "averageMs" : 0.0355985, "percentage" : 1.37145 }
, { "name" : "layer4.0.conv1.weight + QuantizeLinear_445_quantize_scale_node + Conv_447 + Relu_449", "timeMs" : 25.4399, "averageMs" : 0.0264723, "percentage" : 1.01986 }
, { "name" : "layer4.0.downsample.0.weight + QuantizeLinear_468_quantize_scale_node + Conv_470", "timeMs" : 8.88198, "averageMs" : 0.00924243, "percentage" : 0.35607 }
, { "name" : "layer4.0.conv2.weight + QuantizeLinear_457_quantize_scale_node + Conv_459 + Add_478 + Relu_479", "timeMs" : 44.1804, "averageMs" : 0.0459734, "percentage" : 1.77115 }
, { "name" : "layer4.1.conv1.weight + QuantizeLinear_487_quantize_scale_node + Conv_489 + Relu_491", "timeMs" : 44.3623, "averageMs" : 0.0461627, "percentage" : 1.77844 }
, { "name" : "layer4.1.conv2.weight + QuantizeLinear_499_quantize_scale_node + Conv_501 + Add_509 + Relu_510", "timeMs" : 44.3341, "averageMs" : 0.0461333, "percentage" : 1.77731 }
, { "name" : "layer4.2.conv1.weight + QuantizeLinear_518_quantize_scale_node + Conv_520 + Relu_522", "timeMs" : 42.4246, "averageMs" : 0.0441463, "percentage" : 1.70076 }
, { "name" : "layer4.2.conv2.weight + QuantizeLinear_530_quantize_scale_node + Conv_532 + Add_540 + Relu_541", "timeMs" : 43.7076, "averageMs" : 0.0454813, "percentage" : 1.75219 }
, { "name" : "deconv_layers.0.weight + QuantizeLinear_549_quantize_scale_node + ConvTranspose_551", "timeMs" : 77.9405, "averageMs" : 0.0811035, "percentage" : 3.12456 }
, { "name" : "deconv_layers.1.weight + QuantizeLinear_559_quantize_scale_node + Conv_561 + Relu_563", "timeMs" : 60.049, "averageMs" : 0.0624859, "percentage" : 2.40731 }
, { "name" : "deconv_layers.4.weight + QuantizeLinear_571_quantize_scale_node + ConvTranspose_573", "timeMs" : 107.53, "averageMs" : 0.111894, "percentage" : 4.31079 }
, { "name" : "deconv_layers.5.weight + QuantizeLinear_581_quantize_scale_node + Conv_583 + Relu_585", "timeMs" : 80.9985, "averageMs" : 0.0842856, "percentage" : 3.24715 }
, { "name" : "deconv_layers.8.weight + QuantizeLinear_593_quantize_scale_node + ConvTranspose_595", "timeMs" : 381.204, "averageMs" : 0.396674, "percentage" : 15.2821 }
, { "name" : "deconv_layers.9.weight + QuantizeLinear_603_quantize_scale_node + Conv_605 + Relu_607", "timeMs" : 221.925, "averageMs" : 0.230931, "percentage" : 8.89675 }
, { "name" : "hm.0.weight + QuantizeLinear_615_quantize_scale_node + Conv_617 + Relu_618", "timeMs" : 84.4777, "averageMs" : 0.087906, "percentage" : 3.38663 }
, { "name" : "wh.0.weight + QuantizeLinear_636_quantize_scale_node + Conv_638 + Relu_639", "timeMs" : 85.658, "averageMs" : 0.0891342, "percentage" : 3.43395 }
, { "name" : "reg.0.weight + QuantizeLinear_657_quantize_scale_node + Conv_659 + Relu_660", "timeMs" : 85.4159, "averageMs" : 0.0888823, "percentage" : 3.42424 }
, { "name" : "hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628", "timeMs" : 19.5074, "averageMs" : 0.0202991, "percentage" : 0.782035 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628", "timeMs" : 6.52869, "averageMs" : 0.00679364, "percentage" : 0.261729 }
, { "name" : "wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649", "timeMs" : 18.7298, "averageMs" : 0.0194899, "percentage" : 0.750862 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649", "timeMs" : 6.69421, "averageMs" : 0.00696588, "percentage" : 0.268364 }
, { "name" : "reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670", "timeMs" : 18.7625, "averageMs" : 0.0195239, "percentage" : 0.752172 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670", "timeMs" : 7.04306, "averageMs" : 0.00732889, "percentage" : 0.28235 }
]

然后通过上述代码生成EngineLayers_0.dot

这个.dot就包含了网络计算图的信息,节点、线段等。

最终通过以下代码画图就可以了!

import pydot

graphs = pydot.graph_from_dot_file("EngineLayers_0.dot")
graph = graphs[0]
graph.write_png("trt_engine.png")

简单对比

简单对比下原模型和构建engine后的模型:

  • 输入部分:

《终于把TensorRT的engine模型的结构图画出来了!》

《终于把TensorRT的engine模型的结构图画出来了!》

  • 输出部分:

《终于把TensorRT的engine模型的结构图画出来了!》

《终于把TensorRT的engine模型的结构图画出来了!》

关于TensorRT模型量化的细节部分,老潘之后会花篇幅单独说,这里就不详谈了。

结语

如果你遇到画出来的图是这样的:

《终于把TensorRT的engine模型的结构图画出来了!》

恭喜你!你的电脑是万中无一的绝世高手!解决方法很简单,换台电脑就好了(逃)!

  点赞
本篇文章采用 署名-非商业性使用-禁止演绎 4.0 国际 进行许可
转载请务必注明来源: https://oldpan.me/archives/tensorrt-engine-network

   关注Oldpan博客微信公众号,你最需要的及时推送给你。


  1. 自媒体运营说道:

    绝对干货,值得收藏!