S100 Torch算子BPU约束列表

注意

下方默认进行了如下别名替换:

import horizon_plugin_pytorch as horizon

下方表格中:

lhs:left-hand side,指运算中的左操作数。

rhs:right-hand side,指运算中的右操作数。

Torch OperatorEager Mode OperatorTorch constraint
torch.abs
torch.Tensor.abs
input:
Type: int8, int16
Shape: [*]
output:
Same as input
torch.acoshorizon.nn.Acosinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.acoshhorizon.nn.Acoshinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.add
torch.Tensor.add
torch.nn.quantized.FloatFunctional OR
horizon.nn.quantized.FloatFunctional
lhs:
Type: int8, int16, int32, if type is int32, this hbir.add op must be fusible to a Conv op
Shape: [*]
rhs:
Same as lhs
output:
Same as lhs
torch.all
torch.Tensor.all
input:
Type: bool8
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input
torch.any
torch.Tensor.any
input:
Type: bool8
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input
torch.argmax
torch.Tensor.argmax
input:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]; Special, ReduceArgMax/ReduceArgMin's reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number.
torch.argmin
torch.Tensor.argmin
input:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]; Special, ReduceArgMax/ReduceArgMin's reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number.
torch.asinhorizon.nn.Asininputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.asinhhorizon.nn.Asinhinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.atanhorizon.nn.Atanif int8:
inputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.atanhhorizon.nn.Atanhif int8:
inputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.ceilhorizon.nn.Ceilif int8:
inputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
if int16:
input:
Type: int8, int16
Shape: [*]
output:
Same as input
torch.clamp
torch.clip
torch.Tensor.clamp
torch.Tensor.clip
 if isinstance(args, scalar):
input:
Type: int8, int16
Shape: [*]
output:
Same as input
torch.cat
torch.concat
torch.concatenate
torch.nn.quantized.FloatFunctional OR
horizon.nn.quantized.FloatFunctional
input:
Arg Number: input number ∈ [1, 1024]
Dim: all dims < 131072 
size < 2G
output:
Same as input
torch.coshorizon.nn.Cosinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.coshhorizon.nn.Coshinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.cumsum
torch.Tensor.cumsum
horizon.nn.CumSuminput:
Type: int8, int16, input must be complete quantized
Shape: [*, dim[axis], *]
Dim: * ∈ [1, 65536]; dim[axis] ∈ [1, 8192]
output:
Type: int8, int16, int32
Shape/Dim: same with input
torch.divhorizon.nn.Divif rounding_mode is None:
inputs:
Type: int8, int16
output:
Type: int8, int16, int32
Shape: [*]
torch.eq
torch.Tensor.eq
 lhs:
Type: int8, int16, bool8
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.gather
torch.Tensor.gather
 input:
Type: int8, int16, int32, float16, float32
Shape: [*]
input will transpose to [N, W, C]. W is inputShape[dim], N is the product of inputShape[:dim], C is the product of inputShape[dim+1:].
N, C ∈ [1, 1048576]. N × C should not be larger than 1048576
W ∈ [1, 4096]. If input type is int8, int16, W ∈ [1, 32768].
indices:
Type: int8, int16, int32, int64
Shape: [*] indices value should not be larger than 32768
indices will transpose to [N, D, C]. D is indicesShape[dim], N is the product of indicesShape[:dim], C is the product of indicesShape[dim+1:].
N, C ∈ [1, 1048576], D ∈ [1, 737280(720*1024)].
indicesShape[i] <= inputShape[i] for all dimensions i != dim.
output:
Same as indices
torch.gt
torch.greater
torch.Tensor.gt
torch.Tensor.greater
 lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.ge
torch.greater_equal
torch.Tensor.ge
torch.Tensor.greater_equal
 lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.lt
torch.less
torch.Tensor.lt
torch.Tensor.less
 lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.le
torch.less_equal
torch.Tensor.le
torch.Tensor.less_equal
 lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.erfhorizon.nn.Erfinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.exphorizon.nn.Expinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.Tensor.expand input:
Arg Number: input number ∈ [1, 1024]
Dim: all dims < 131072 
size < 2G
output:
Same as input
torch.flatten
torch.Tensor.flatten
input:
No limits
output:
Same as input
torch.flip
torch.Tensor.flip
input:
Type: int8, int16, int32
output:
Same as input
torch.floorhorizon.nn.Floorif int8:
inputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
if int16:
input:
Type: int8, int16
Shape: [*]
output:
Same as input
torch.index_select
torch.Tensor.index_select
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
input will transpose to [N, W, C]. W is inputShape[dim], N is the product of inputShape[:dim], C is the product of inputShape[dim+1:].
N, C ∈ [1, 1048576], W ∈ [1, 4096]. If input type is int8, int16, W ∈ [1, 32768].
index:
Type: int8, int16, int32, int64
Shape: [*] index value should not be larger than 32768. And the reduce multiple of all index dims of shape should in range [1, 737280(720*1024)], because all dims 
will be reduced to W dim of indices and output. If W of fout is larger than 737280, this op will be split too many sub-ops.
output:
Same as input
torch.loghorizon.nn.HardLoginputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.logical_andlhs:
Type: int8, int16, bool8
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.logical_notinput:
Type: int8, int16, bool8
Shape: [*]
output:
Type: bool8
torch.logical_orlhs:
Type: int8, int16, bool8
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.Tensor.masked_fillcondition:
Type: bool8
lhs:
Type: int8, int16
Shape: [*]
rhs:
Type: int8, int16
output:
Same as lhs
torch.matmulhorizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16
Shape: [*,M,C]
Dim: * ∈ [1, 4096], M,C ∈ [1, 8192]
rhs:
Type: int8, int16
Shape: [*,C,N]
Dim: * ∈ [1, 4096]; C ∈ [1, 8192], N ∈  [1, 1048576]
output:
Type: int8, int16, int32
Shape: [*,M,N]
Other constraints: Same as lhs and rhs
torch.max
torch.Tensor.max
 if dim is None:
input:
Type: bool8, int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input
if dim is not None:
input:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 32767]
output:
Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number
torch.maximumhorizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Same as lhs
torch.meanhorizon.nn.quantized.FloatFunctionalinput:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]; Special, ReduceArgMax/ReduceArgMin's reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number.
torch.min
torch.Tensor.min
 if dim is None:
input:
Type: bool8, int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input
if dim is not None:
input:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 32767]
output:
Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number
torch.minimumhorizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Same as lhs
torch.mul
torch.Tensor.mul
torch.nn.quantized.FloatFunctional or
horizon.nn.quantized.FloatFunctional
lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Type: int8, int16, int32
Shape: [*]
torch.neg
torch.negative
torch.Tensor.neg
torch.Tensor.negative
input:
Type: int8, int16
Shape: [*]
output:
Same as input
torch.ne
torch.not_equal
torch.Tensor.ne
torch.Tensor.not_equal
 lhs:
Type: int8, int16, bool8
Shape: [*]
rhs:
Same as lhs
output:
Type: bool8
torch.permute
torch.Tensor.permute
 input:
No limits
output:
Same as input
torch.powhorizon.nn.Powif exponent == 2:
lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Type: int8, int16, int32
Shape: [*]
if exponent != 2:
inputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.reciprocalhorizon.nn.Reciprocalinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.Tensor.repeat input:
No limits
output:
Same as input
torch.reshape
torch.Tensor.reshape
torch.Tensor.view
input:
No limits
output:
Same as input
torch.roll
torch.Tensor.roll
input:
No limits
output:
Same as input
torch.rsqrt
torch.Tensor.rsqrt
horizon.nn.Rsqrtinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.sinhorizon.nn.Sininputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.sinhhorizon.nn.Sinhinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.slice_scatterhorizon.nn.SliceScatterinput:
Dim: all dims < 2097152 
output:
Same as input
torch.split input:
Dim: all dims < 2097152 
output:
Same as input
torch.sqrthorizon.nn.Sqrtinputs:
Type: int8, int16
output:
Type: int8, int16
Shape: [*]
torch.squeeze
torch.Tensor.squeeze
 input:
No limits
output:
Same as input
torch.stackhorizon.nn.quantized.FloatFunctionalinput:
Arg Number: input number ∈ [1, 1024]
Dim: all dims < 131072 
size < 2G
output:
Same as input
torch.subhorizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Same as lhs
torch.sumhorizon.nn.quantized.FloatFunctionalinput:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]; Special, ReduceArgMax/ReduceArgMin's reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 65535]
output:
Same as input, ReduceArgMax/ReduceArgMin's output can be of type int32 or int64, as long as the size of the reduced axis can be represented using an int16 number.
torch.tanhorizon.nn.Taninputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.tile
torch.Tensor.tile
input:
No limits
output:
Same as input
torch.Tensor.to
torch.Tensor.float
input:
Type: int8, int16, bool8
Shape: [*]
output:
Same as input
torch.transpose
torch.Tensor.transpose
 input:
No limits
output:
Same as input
torch.trilcondition:
Type: bool8
lhs:
Type: int8, int16
Shape: [*]
rhs:
Type: int8, int16
output:
Same as lhs
torch.triucondition:
Type: bool8
lhs:
Type: int8, int16
Shape: [*]
rhs:
Type: int8, int16
output:
Same as lhs
torch.unsqueeze
torch.Tensor.unsqueeze
 input:
No limits
output:
Same as input
torch.wherehorizon.nn.Whereif input_num == 2:
condition:
Type: bool8
lhs:
Type: int8, int16
Shape: [*]
rhs:
Type: int8, int16
output:
Same as lhs
torch.zeros_like
torch.ones_like
No limits
torch.linalg.normhorizon.nn.LinalgNormif ord in (2, None) and isinstance(dim, int):
lhs:
Type: int8, int16
Shape: [*]
rhs:
Type: int8, int16
Shape: [*]
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.adaptive_avg_pool1d
torch.nn.AdaptiveAvgPool1d
torch.nn.AdaptiveAvgPool1dinput:
Type: int8, int16
Shape: [*,H,W,C] or [*,L,C]
output:
Same as input
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW] or [SL]
Dim: SH, SW, SL ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END] or [PL_BEGIN,PL_END]
PH_BEGIN,PW_BEGIN,PL_BEGIN,PH_END,PW_END,PL_END ∈ [-255, 256]
torch.nn.functional.adaptive_avg_pool2d
torch.nn.AdaptiveAvgPool2d
torch.nn.AdaptiveAvgPool2dinput:
Type: int8, int16
Shape: [*,H,W,C] or [*,L,C]
output:
Same as input
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW] or [SL]
Dim: SH, SW, SL ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END] or [PL_BEGIN,PL_END]
PH_BEGIN,PW_BEGIN,PL_BEGIN,PH_END,PW_END,PL_END ∈ [-255, 256]
torch.nn.functional.avg_pool2d
torch.nn.AvgPool2d
 input:
Type: int8, int16
Shape: [*,H,W,C] or [*,L,C]
output:
Same as input
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW] or [SL]
Dim: SH, SW, SL ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END] or [PL_BEGIN,PL_END]
PH_BEGIN,PW_BEGIN,PL_BEGIN,PH_END,PW_END,PL_END ∈ [-255, 256]
torch.nn.functional.dropout
torch.nn.Dropout
torch.nn.functional.dropout2d
torch.nn.Dropout2d
torch.nn.DropoutN/A, collapsed in graph optimization phase
torch.nn.functional.elu
torch.nn.ELU
torch.nn.ELUinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.Embeddinginput:
Type: int8, int16, int32, float16, float32
Shape: [*]
if gather1d, W = inputShape[batchDim], W ∈ [1, 4096]. If input type is int8, int16 W ∈ [1, 32768]
if gather2d, H = inputShape[batchDim], W = inputShape[batchDim+1], H, W ∈ [1, 4096]. If input type is int8, H, W ∈ [1, 32768]. H and W cannot both be greater than 4096 at the same time.
B is product of inputShape[0: batchDim], B ∈ [1, 1048576].
C is product of inputShape[batchDim+D:], C ∈ [1, 1048576].
indices:
Type: int8, int16, int32, int64
Shape: [*, D] indices value should not be larger than 32768. D ∈ [1, 2].
output:
Shape: [*]
Same as input
batchDim:
The number of batch dimensions. The gather of indexing starts from dimension of input[batchDim:]
torch.nn.functional.gelu
torch.nn.GELU
torch.nn.GELUinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.glu
torch.nn.GLU
torch.nn.GLUinputs:
Type: int8, int16
output:
Type: int8, int16, int32
Shape: [*]
torch.nn.functional.grid_sample input:
Type: int8
Shape: [*,H,W,C]
Dim: H ∈ [1, 32768], W ∈ [1, 32768], other dims ∈ [1, 65536].
NOTE: H and W cannot both be greater than 4096 at the same time.
grid:
Type: int16
Shape: [*,H,W,2]
output:
Same as input except Dim constraints
mode:
Only support bilinear and nearest
padding_mode:
Only support zeros and border
torch.nn.functional.hardsigmoid
torch.nn.HardSigmoid
torch.nn.HardSigmoidinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.interpolate
torch.nn.Upsample
torch.nn.UpsamplingNearest2d
torch.nn.UpsamplingBilinear2d
 input:
Type: int8
Shape: [*,H,W,C]
The integer part of step ∈ [-256, 255], otherwise the backend will be on cpu
output:
Same as input
mode:
support nearest and bilinear
torch.nn.functional.leaky_relu
torch.nn.LeakyReLU
torch.nn.LeakyReLUinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.log_softmax
torch.nn.LogSoftmax
torch.nn.LogSoftmaxinput:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: int8, int16
Shape: [*]
torch.nn.functional.mish
torch.nn.Mish
torch.nn.Mishinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.normalizehorizon.nn.Normalizeif ord in (2, None) and isinstance(dim, int):
lhs:
Type: int8, int16
Shape: [*]
rhs:
Type: int8, int16
Shape: [*]
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.pad
torch.nn.ConstantPad1d
torch.nn.ConstantPad2d
torch.nn.ConstantPad3d
torch.nn.ReplicationPad1d
torch.nn.ReplicationPad2d
torch.nn.ReplicationPad3d
torch.nn.ZeroPad2d
 input:
Type: int64, uint64 and f64 are not supported when expansionMode is 'constant' else no constraints
Dim: all dims < 737280 when expansionMode is not 'constant' else no constraints
output:
Same as input
begin/end:
Value should be in range [1, 1024]
torch.nn.functional.pixel_shuffle
torch.nn.PixelShuffle
 input:
dim ∈ [3, 7]
torch.nn.functional.pixel_unshuffle
torch.nn.PixelUnshuffle
 input:
No limits
output:
No limits
torch.nn.PReLUif isinstance(weight, scalar):
lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Same as lhs
if not isinstance(weight, scalar):
lhs:
Type: int8, int16
Shape: [*]
rhs:
Same as lhs
output:
Same as lhs
torch.nn.functional.relu
torch.nn.ReLU
torch.nn.ReLUinput:
Type: int8, int16, int32
Shape: [*]
output:
Same as input
torch.nn.ReLU6(fused)torch.nn.ReLU6input:
Type: int8, int16
Shape: [*]
output:
Same as input
torch.nn.functional.silu
torch.nn.SiLU
torch.nn.SiLUinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.softmax
torch.nn.Softmax
torch.nn.Softmaxinput:
Type: int8, int16
Dim: reduce axis dim size ∈ [1, 65535]
output:
Type: int8, int16
torch.nn.functional.softplus
torch.nn.Softplus
inputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.BatchNorm2d
torch.nn.BatchNorm3d
 input:
Type: int8, int16
Shape: [*,H,W,C]
mean:
Type: f32
Shape: [C]
var:
Type: f32
Shape: [C]
weight:
Type: f32
Shape: [C]
bias:
Type: f32
Shape: [C]
output:
Same as input
torch.nn.Conv1d
torch.nn.Conv2d
input:
--conv 1d--
Type: int8, int16
Shape: [*,L,C]
Dim: * ∈ [1, 4096]; L,C ∈ [1, 65536]
--conv 2d--
Type: int8, int16
Shape: [*,H,W,C]
Dim: * ∈ [1, 4096]; H,W,C ∈ [1, 65536]
weight:
--conv 1d--
Type: int8, int16
Shape: [N,KL,C]
Dim: C ∈ [1, 8192]; KL ∈ [1, 31]; N ∈ [1, 65536] if fout is the last layer of conv else [1, 8192]
Size: KL × C ∈ [1, 65536]
--conv 2d--
Type: int8, int16
Shape: [N,KH,KW,C]
Dim: C ∈ [1, 8192]; KH,KW ∈ [1, 31]; N ∈ [1, 65536] if fout is the last layer of conv else [1, 8192]
Size: KH × KW × C ∈ [1, 65536]
bias:
Type: f32
output:
--conv 1d--
Type: int8, int16, int32
Shape: [*,L,C]
Dim: * ∈ [1, 4096]; L,C ∈ [1, 65536]
--conv 2d--
Type: int8, int16, int32
Shape: [*,H,W,C]
Dim: * ∈ [1, 4096]; H,W,C ∈ [1, 65536]
stride:
--conv 1d--
Shape: [SL]
Dim: SL ∈ [1, 256]; SL ∈ {1} if dilation > 1
--conv 2d--
Shape: [SH,SW]
Dim: SH,SW ∈ [1, 256]; SH,SW ∈ {1} if dilation > 1
pad:
--conv 1d--
Shape: [P_left,P_right]
Dim: P_left,P_right ∈ [-L/2, 256]
--conv 2d--
Shape: [P_top,P_left,P_bottom,P_right]
Dim: P_top,P_bottom ∈ [-H/2, 256], P_left,P_right ∈ [-W/2, 256]
groupNum:
fin.c is divisible by group number
dilation:
--conv 1d--
Shape: [DL]
Dim: DL ∈ [1, 18]
--conv 2d--
Shape: [DH,DW]
Dim: DH,DW ∈ [1, 18]
others:
--conv 1d--
Stride only support odd number and 2 when conv is a int16 depthwise conv
If groupNum > 1, for each group, fin.c' ∈ [1, 65535], KL × fin.c' ∈ [1, 65535]
--conv 2d--
Stride only support odd number and 2 when conv is a int16 depthwise conv
If groupNum > 1, for each group, fin.c' ∈ [1, 65535], KH × KW × fin.c' ∈ [1, 65535]
fin.c' = fin.c × min(lcm(fout.c × (lcm(fin.c, 4) / fin.c), 8) / fout.c, groupNum)
torch.nn.ConvTranspose1d
torch.nn.ConvTranspose2d
input:
Type: int8, int16; input and weight cannot both be int16
1d_Shape: [*,W,C]
1d_Dim: * ∈ [1, 128]; W ∈ [1, 65536]; C ∈ [1, 2048]
2d_Shape: [*,H,W,C]
2d_Dim: * ∈ [1, 128]; H,W ∈ [1, 65536]; C ∈ [1, 2048]
weight:
Type: int8, int16; input and weight cannot both be int16
1d_Shape: [N,KW,C]
1d_Dim: N,C ∈ [1, 2048]; KW ∈ [1, 14]
1d_Size: KW × C ∈ [1, 65536]
2d_Shape: [N,KH,KW,C]
2d_Dim: N,C ∈ [1, 2048]; KH,KW ∈ [1, 14]; KH,KW cannot both be 1
2d_Size: KH × KW × C ∈ [1, 65536]
bias:
Type: f32
output:
Same as input, the type additionally supports int32
stride:
1d_Shape: [SW]
1d_Dim: SW ∈ [1, 14];
2d_Shape: [SH,SW]
2d_Dim: SH,SW ∈ [1, 14];
pad:
1d_Shape: [P_left,P_bottom]
1d_Dim: P_left,P_bottom ∈ [0, 256]
2d_Shape: [P_top,P_left,P_bottom,P_right]
2d_Dim: P_top,P_left,P_bottom,P_right ∈ [0, 256]
dilation:
1d_Shape: [DW]
1d_Dim: DW ∈ {1}
2d_Shape: [DH,DW]
2d_Dim: DH,DW ∈ {1}
torch.nn.GRUdropout must be 0.0
input:
Type: int8, int16
Dim: C_in ∈ [1, 65535], Seq length < 1024, other dims < 2097152
output:
Dim: all dims < 131072
size < 2G
torch.nn.LSTMinput:
Type: int8, int16
Dim: C_in ∈ [1, 65535], Seq length < 1024, other dims < 2097152
output:
Dim: all dims < 131072
size < 2G
torch.nn.Identity N/A, collapsed in graph optimization phase
torch.nn.LayerNorm
torch.nn.GroupNorm
torch.nn.InstanceNorm1d
torch.nn.InstanceNorm2d
torch.nn.InstanceNorm3d
input:
Type: int8, int16
Dim: normalized dim size ∈ [1, 65535]
output:
Type: int8, int16
horizon.nn.LayerNorminput:
Type: int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]; Special, ReduceArgMax/ReduceArgMin's reduce axis dim size ∈ [1, 32767]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: int8, int16, int32, if type is int32, this hbir.add op must be fusible to a Conv op
Shape: [*]
torch.nn.Linearinput:
Type: int8, int16
Shape: [*,C_in]
Dim: *, C_in ∈ [1, 65536]
weight:
Type: int8, int16
Shape: [C_out, C_in]
Dim: C_out ∈ [1, 1048576]; C_in ∈ [1, 8192]
bias:
Type: f32
output:
Type: int8, int16, int32
Other constraints: Same as input
torch.nn.functional.max_pool1d
torch.nn.MaxPool1d
torch.nn.functional.adaptive_max_pool1d
torch.nn.AdaptiveMaxPool1d
input:
Type: int8, int16
Shape: [*,H,W,C] or [*,L,C]
output:
Same as input
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW] or [SL]
Dim: SH, SW, SL ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END] or [PL_BEGIN,PL_END]
PH_BEGIN,PW_BEGIN,PL_BEGIN,PH_END,PW_END,PL_END ∈ [-255, 256]
torch.nn.functional.max_pool2d
torch.nn.MaxPool2d
torch.nn.functional.adaptive_max_pool2d
torch.nn.AdaptiveMaxPool2d
 input:
Type: int8, int16
Shape: [*,H,W,C] or [*,L,C]
output:
Same as input
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW] or [SL]
Dim: SH, SW, SL ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END] or [PL_BEGIN,PL_END]
PH_BEGIN,PW_BEGIN,PL_BEGIN,PH_END,PW_END,PL_END ∈ [-255, 256]
torch.nn.MultiheadAttentionsrc_len, tgt_len, head_dim ∈ [1, 8192]
embed_dim, kdim, vdim ∈ [1, 65536]
input:
Type: int8, int16
output:
Type: int8, int16
torch.nn.functional.selu
torch.nn.SELU
torch.nn.SELUinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.functional.sigmoid
torch.sigmoid
torch.Tensor.sigmoid
torch.nn.Sigmoid
torch.nn.Sigmoidinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.tanh
torch.nn.Tanh
torch.nn.Tanhinputs:
Type: int8, int16
outputs:
If input is int8, output is int8
If input is int16, output is int8/int16
torch.nn.TransformerDecoderLayerxxx_is_causal is not supported
Please refer to inner operator for full constraints
torch.nn.TransformerEncoderLayeris_causal is not supported
Please refer to inner operator for full constraints
horizon.nn.AnchorGenerator No limits
horizon.nn.BaseGridGenerator No limits
horizon.nn.functional.filterinput:
Type: int8, int16
Shape: [*, H, W, C]
bpu filter batch dim must be 1 when rank4, input feature size should be less than sram/5, H,W must be in range (0, 32768)
      
threshold:
for int8 input, value should be in range [-128, 127], for int16 input, value should be in range[-32768, 32767]
others:
All ops between the filterData and the last layer of the model should be cpu ops
horizon.nn.GridSampleinput:
Type: nearest mode supports int8, int16, int32, float16, float32, pad must 0 when bit width > 8; the others only support int8
Shape: [*,H,W,C]
Dim: H ∈ [1, 32768], W ∈ [1, 32768], other dims ∈ [1, 65536].
NOTE: H and W cannot both be greater than 4096 at the same time.
grid:
Type: int16. if nearest mode and input no-int8, gird should be int16 without quant info(absolute coordinate)
Shape: [*,H,W,2]
output:
Same as input except Dim constraints
torchvision.ops.DeformConv2dinput:
Type: int8
Shape: [*,H,W,C]
Dim: H,W ∈ [1, 1024]; H × W ≤ 720 × 1024; other dims ∈ [1, 65536]
offset:
Type: int16
Shape: [*,OH,OW,2 × offsetGroupNum × KH × KW]
Size: 2 × offsetGroupNum × KH × KW ∈ [2, 256], OH × KH × OW × KW ≤ 720 × 1024
mask:
Type: int8
Shape: [*,OH,OW,offsetGroupNum × KH × KW]
Size: offsetGroupNum × KH × KW ∈ [1, 128]
weight:
Type: int8
Shape: [N,KH,KW,C]
Dim: C ∈ [1, 8192]; KH,KW ∈ [1, 8]; N ∈ [1, 4096]
Size: KH × KW × C ∈ [1, 65536]
bias:
Type: f32
output:
Type: int8, int16, int32
Other constraints: Same as fin
stride:
Shape: [SH,SW]
Dim: SH,SW ∈ [1]
pad:
Shape: [P_top,P_left,P_bottom,P_right]
Dim: P_top,P_bottom ∈ [-H/2, 256], P_left,P_right ∈ [-W/2, 256]
groupNum:
fin.c is divisible by group number
offsetGroupNum:
fin.c is divisible by offset group number
Size: offsetGroupNum ∈ [1, 2]
dilation:
Shape: [DH,DW]
Dim: DH,DW ∈ [1]
others:
For each group, fin.c ∈ [1, 8192], KH × KW × fin.c ∈ [1, 65535], fin.c = C when group = 1
torch.Tensor.__getitem__ if use as slice:
input:
Dim: all dims < 2097152 
output:
No limits
if use as index:
input:
No limits
output:
Same as input
torch.Tensor.clone
torch.Tensor.contiguous
torch.Tensor.detach
 N/A, collapsed in graph optimization phase