Skip to content

Latest commit

 

History

History
387 lines (375 loc) · 41.5 KB

File metadata and controls

387 lines (375 loc) · 41.5 KB

Peformance Comparison of Cross-modal Retrieval

Catalogue

Peformance of Commonly-used Datasets

Performance of Flickr8K

(* indicates Ensemble models, ^ indicates questionable authen)

Method_name Concise_note Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
DeViSERCNN 4.816.527.3 5.920.129.6
SDT-RNNAlexNet 4.518.028.6 6.118.529.0
SDT-RNNRCNN 6.022.734.0 6.621.631.7
DeFragAlexNet 5.919.227.3 5.217.626.5
DeFragRCNN 12.632.944.0 9.729.642.5
m-RNNAlexNet 14.537.248.5 11.531.042.4
DVSADepTree 14.837.950.0 11.631.443.8
DVSARCNN 16.540.654.2 11.832.144.7
UVSEAlexNet 13.536.245.7 10.431.043.7
UVSEVggNet 18.040.955.0 12.537.051.5
NICGoogleNet 20--61 19--64
m-CNN*OverFeat 14.935.949.0 11.834.548.0
m-CNN*VggNet 24.853.767.1 20.347.661.7
HM-LSTMRCNN 27.7--68.6 24.4--68.1
SPEVggNet 30.160.473.7 23.051.364.8
FVGMM+HGLMM 31.059.373.7 21.250.064.8
MFMVggNet 35.667.078.6 28.458.572.3
NAAResNet 37.268.179.1 27.759.671.8
2WayNetVggNet 43.463.2-- 29.349.7--
SCAN*BUTD 52.281.089.2 38.367.878.9
IMRAMBUTD, Image 48.578.185.3 32.061.473.9
IMRAMBUTD, Text 52.181.590.1 40.269.079.2
IMRAMBUTD, Full 54.784.291.0 41.069.279.9

Performance of Flickr30K

Method_name Concise_note Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
DeViSERCNN 4.518.129.2 6.721.932.7
SDT-RNNRCNN 9.629.841.1 8.929.841.1
DeFragRCNN 14.237.751.3 10.230.844.2
DeFragftRCNN 16.440.254.7 10.331.444.5
DCCAAlexNet 16.739.352.9 12.631.043.0
NICGoogleNet 17--56 17--57
DVSADepTree 20.046.659.4 15.036.548.2
DVSARCNN 22.248.261.4 15.237.750.5
UVSEAlexNet 14.839.250.9 11.834.046.3
UVSEVggNet 23.050.762.9 16.842.056.5
LRCNVggNet 23.646.658.3 17.540.350.8
m-CNN*OverFeat 20.144.256.3 15.940.351.9
m-CNN*VggNet 33.664.174.9 26.256.369.6
m-RNNAlexNet 18.440.250.9 12.631.241.5
m-RNNVggNet 35.463.873.7 22.850.763.1
FVGMM+HGLMM 35.062.073.8 25.052.766.0
HM-LSTMRCNN 38.1--76.5 27.7--68.8
SPEVggNet 40.368.979.9 29.760.172.1
sm-LSTMVggNet 42.467.579.9 28.257.068.4
sm-LSTM*VggNet 42.571.981.5 30.260.4 72.3
CSEResNet 44.674.383.8 36.969.179.6
MDMVggNet 44.975.484.4 34.467.077.7
RRF-NetResNet 47.677.487.1 35.468.379.9
CMPLMobileNet 40.366.976.7 30.458.268.5
CMPLResNet 49.676.886.1 37.365.775.5
2WayNetVggNet 49.867.5-- 36.055.6--
MFMVggNet 50.278.186.7 38.270.180.2
VSE++VggNet 41.369.177.9 31.460.071.2
VSE++ResNet 52.980.587.2 39.670.179.5
TIMAMResNet, Bert 53.178.887.6 42.671.681.9
TERNBUTD, Bert 53.279.486.0 41.171.981.2
DANVggNet 41.473.582.5 31.861.772.5
DANResNet 55.081.8 89.0 39.469.279.1
NAAResNet 55.180.389.6 39.468.879.9
SCOVggNet 44.274.183.6 32.864.374.9
SCOResNet 55.582.089.3 41.170.580.1
Dual-PathVggNet 47.677.387.1 35.366.678.2
Dual-PathResNet 55.681.989.5 39.169.280.9
CVSE++ResNet 56.682.590.2 42.471.680.8
GXNResNet 56.8--89.6 41.5--80.1
SMANResNet, Random 56.984.891.9 43.273.383.5
SMANResNet, Glove 57.385.392.2 43.473.783.4
Align2GroundBUTD ------ 49.774.883.3
A3VSEBUTD 65.089.294.5 49.579.586.6
MTFNBUTD 63.185.892.4 46.375.383.6
MTFNBUTD, RR_no_STT 65.388.393.3 46.775.983.8
MTFNBUTD, RR_STT 65.388.393.3 52.080.186.1
R-SCANBUTD, VrR-VG 66.390.696.0 51.477.884.9
SAVEResNet 67.288.394.2 49.878.786.2
SCANBUTD, t2i_AVE 61.887.593.7 45.874.483.0
SCANBUTD, i2t_AVE 67.989.094.4 43.974.282.8
SCAN*BUTD, AVE+LSE 67.490.395.8 48.677.785.2
BFANBUTD, prob 65.589.4-- 47.977.6--
BFANBUTD, equal 64.589.7-- 48.877.3--
BFAN*BUTD 68.191.4-- 50.878.4--
CAMPBUTD 68.189.795.2 51.577.185.3
RDANBUTD 68.191.095.9 54.180.987.2
GSLSResNet, BUTD 68.289.194.5 43.473.582.5
PersonalityResNeXt, Transformer 68.490.695.3 ------
CASCResNet 68.590.695.9 50.278.386.3
GVSE*BUTD 68.590.995.5 50.679.887.6
HALSCAN_i2t 68.689.994.7 46.074.082.3
OANBUTD 68.693.096.0 53.380.187.1
SAEMBUTD, Bert 69.191.095.1 52.481.188.1
MPLSCAN_i2t 69.489.995.4 47.575.583.1
LIWEBUTD, CLMR 64.088.393.3 46.876.484.5
LIWEBUTD, -Glove 66.488.994.1 47.576.284.9
LIWEBUTD, +Glove 69.690.395.6 51.280.487.2
PFANBUTD, t2i 66.089.694.3 49.677.084.2
PFANBUTD, i2t 67.690.093.8 45.774.783.6
PFAN*BUTD 70.091.895.0 50.478.786.1
PFAN++*BUTD 70.191.896.1 52.779.987.0
CAANBUTD 70.191.697.2 52.879.087.9
DP-RNNBUTD 70.291.695.8 55.581.388.2
TERANBUTD, Bert 70.890.995.5 56.581.288.2
HOADBUTD 70.892.796.0 59.585.691.0
HOADBUTD, +Dist 70.892.796.0 60.986.191.0
GOTSCAN_i2t 70.992.895.5 50.778.786.2
VSRNBUTD 70.489.293.7 53.077.985.7
VSRN*BUTD 71.390.696.0 54.781.888.2
SCGVggNet, Prod 57.285.192.1 40.169.579.5
SCGVggNet, Gated 71.890.894.8 49.376.485.6
SGMBUTD 71.891.795.5 53.579.686.5
ADDR*BUTD, BFAN 71.391.596.4 54.080.087.6
ADDR*BUTD, SCAN 72.193.196.1 53.580.487.4
ADDR*BUTD, VSRN 73.092.596.6 55.682.088.9
AOQ*BUTD, SCAN 70.392.095.5 50.079.286.2
AOQ*BUTD, VSRN 72.891.895.8 55.382.288.4
AOQ*BUTD, BFAN 73.294.597.0 54.080.387.7
CVSE^BUTD 73.592.195.8 52.980.487.8
IMRAMBUTD, Image 67.090.595.6 51.278.285.5
IMRAMBUTD, Text 68.891.696.0 53.079.087.1
IMRAMBUTD, Full 74.193.096.6 53.979.487.2
MMCABUTD, Bert 74.292.896.4 54.881.487.8
SAN^VggNet 67.088.094.6 51.477.285.2
SAN^ResNet 75.592.696.2 60.184.790.6
GSMNBUTD, sparse 71.492.096.1 53.979.787.1
GSMNBUTD, dense 72.693.596.8 53.780.087.0
GSMN*BUTD 76.494.397.3 57.482.389.0
ADAPTBUTD, i2t 70.290.895.8 55.582.789.8
ADAPTBUTD, t2i 73.693.796.7 57.083.690.3
ADAPT*BUTD 76.695.497.6 60.786.692.0
SGRAFBUTD, SAF 73.793.396.3 56.181.588.0
SGRAFBUTD, SGR 75.293.396.6 56.281.086.5
SGRAF*BUTD 77.894.197.4 58.583.088.8
DSRANBUTD, GRU 72.693.696.3 56.384.089.8
DSRANBUTD, Bert 75.394.497.6 57.384.890.9
DSRAN*BUTD, GRU 74.994.597.0 58.685.891.3
DSRAN*BUTD, Bert 77.895.197.6 59.286.091.9
ACMMBUTD 80.095.598.2 50.276.884.7
ACMM*BUTD 85.296.798.4 53.879.886.8

Performance of MSCOCO1K

Method_name Concise_note Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
STVcombine-skip 33.867.782.1 25.960.074.6
DVSARCNN 38.469.980.5 27.460.274.8
FVGMM+HGLMM 39.467.980.9 25.159.876.6
m-RNNVggNet 41.073.083.5 29.042.277.0
m-CNN*VggNet 42.873.184.1 32.668.682.8
UVSEVggNet 43.475.785.8 31.066.779.9
HM-LSTMRCNN 43.9--87.8 36.1--86.7
Order-embVggNet 46.7--88.9 37.9--85.9
SPEVggNet 50.179.789.2 39.675.286.9
SEAMVggNet 50.781.490.9 40.375.787.4
sm-LSTMVggNet 52.481.790.8 38.673.484.6
sm-LSTM*VggNet 53.283.191.5 40.775.887.4
CMPLMobileNet 52.983.892.1 41.374.685.9
MDMVggNet 54.784.191.9 44.679.690.5
2WayNetVggNet 55.875.2-- 39.763.3--
CMPMResNet 56.186.392.9 44.678.889.0
CSEResNet 56.384.492.2 45.781.290.6
RRF-NetResNet 56.485.391.5 43.978.188.6
MFMVggNet 58.986.392.4 47.781.090.9
CHAIN-VSEVggNet 51.682.091.3 38.675.187.2
CHAIN-VSEResNet 59.488.094.2 43.579.890.2
NAAResNet 61.387.995.4 47.080.890.1
TERNBUTD, Bert 63.790.596.2 51.985.693.6
VSE++VggNet 57.286.093.3 45.979.489.1
VSE++ResNet 64.690.095.7 52.084.392.0
Dual-PathVggNet 59.486.292.9 41.676.387.5
Dual-PathResNet 65.689.895.5 47.179.990.0
PersonalityResNeXt, Transformer 67.391.796.5 ------
Align2GroundBUTD ------ 56.684.992.8
SMANResNet, Random 67.990.696.2 58.887.093.7
SMANResNet, Glove 68.491.396.6 58.587.493.5
GXNResNet 68.5--97.9 56.6--94.5
GSLSResNet, BUTD 68.994.198.0 58.688.294.9
CVSE++ResNet 69.192.296.1 55.686.793.8
PVSEResNet 69.291.696.6 55.286.593.7
DSVE-LocResNet 69.891.996.6 55.986.994.0
SCOVggNet 66.691.896.6 55.586.693.8
SCOResNet 69.992.997.5 56.787.594.8
R-SCANBUTD, VrR-VG 70.394.598.1 57.687.393.7
SAVEResNet 70.893.297.6 56.987.694.4
MPLSCAN_i2t 71.193.798.2 56.886.793.0
SAEMBUTD, Bert 71.294.197.7 57.888.694.9
SoDeepDSVE-Loc 71.592.897.1 56.287.094.3
OANBUTD 71.796.499.3 60.288.694.5
GVSE*BUTD 72.294.198.1 60.589.495.8
CAMPBUTD 72.394.898.3 58.587.995.0
CASCResNet 72.396.099.0 58.989.896.0
SCANBUTD, t2i_AVE 70.994.597.8 56.487.093.9
SCANBUTD, i2t_AVE 69.293.297.5 54.486.093.6
SCAN*BUTD, LSE+AVE 72.794.898.4 58.888.494.8
LIWEBUTD, -Glove 69.693.998.0 55.587.394.2
LIWEBUTD, CLMR 71.893.197.6 56.287.594.2
LIWEBUTD, +Glove 73.295.598.2 57.988.394.5
SGMBUTD 73.493.897.8 57.587.394.3
ParNetBUTD, NP 72.894.997.9 57.987.494.0
ParNetBUTD, P 73.594.598.3 58.388.294.1
MTFNBUTD 71.994.297.9 57.388.695.0
MTFNBUTD, RR_no_STT 74.394.997.9 57.588.895.0
MTFNBUTD, RR_STT 74.394.997.9 60.189.195.0
RDANBUTD 74.696.298.7 61.689.294.7
CVSE^BUTD 74.895.198.3 59.989.495.2
MMCABUTD, Bert 74.895.697.7 61.689.895.2
BFANBUTD, prob 73.094.8-- 58.087.6--
BFANBUTD, equal 73.794.9-- 58.387.5--
BFAN*BUTD 74.995.2-- 59.488.4--
DP-RNNBUTD 75.395.898.6 62.589.795.1
CAANBUTD 75.595.498.5 61.389.795.2
VSRNBUTD 74.094.397.8 60.888.494.1
VSRN*BUTD 76.294.898.2 62.889.795.1
ADAPTBUTD, i2t 74.594.297.9 62.090.495.5
ADAPTBUTD, t2i 75.395.198.4 63.390.095.5
ADAPT*BUTD 76.595.698.9 62.290.596.0
PFANBUTD, t2i 75.895.999.0 61.089.195.1
PFANBUTD, i2t 70.794.197.8 53.084.592.6
PFAN*BUTD 76.596.399.0 61.689.695.2
SCGVggNet, Prod 73.494.897.6 56.385.693.5
SCGVggNet, Gated 76.696.399.2 61.488.995.1
IMRAMBUTD, Image 76.195.398.2 61.088.694.5
IMRAMBUTD, Text 74.095.698.4 60.688.994.6
IMRAMBUTD, Full 76.795.698.5 61.789.195.0
PFAN++*BUTD 77.196.598.3 62.589.995.4
ADDR*BUTD, SCAN 76.195.598.4 61.288.994.8
ADDR*BUTD, BFAN 76.495.898.3 62.389.496.2
ADDR*BUTD, VSRN 77.496.198.9 63.590.796.7
AOQ*BUTD, SCAN 74.195.298.5 59.888.695.0
AOQ*BUTD, BFAN 77.396.098.5 61.289.295.0
AOQ*BUTD, VSRN 77.595.598.6 63.590.595.8
TERANBUTD, Bert 77.795.998.6 65.091.296.4
HOAD^BUTD 77.096.198.7 65.193.197.9
HOAD^BUTD, +Dist 77.896.198.7 66.293.097.9
TOD-NetVSE++ 68.692.096.9 54.585.392.4
TOD-NetBert 75.895.398.4 61.889.695.0
TOD-Net*Bert 78.196.098.6 63.690.695.8
HALSCAN_i2t 78.396.398.5 60.186.792.8
DSRANBUTD, GRU 76.394.998.4 62.489.795.2
DSRANBUTD, Bert 77.195.398.1 62.989.995.3
DSRAN*BUTD, GRU 78.095.698.5 64.290.495.8
DSRAN*BUTD, Bert 78.395.798.4 64.590.895.8
GSMNBUTD, sparse 76.195.698.3 60.488.795.0
GSMNBUTD, dense 74.795.398.2 60.388.594.6
GSMN*BUTD 78.496.498.6 63.390.195.7
SGRAFBUTD, SAF 76.195.498.3 61.889.495.3
SGRAFBUTD, SGR 78.095.898.2 61.489.395.4
SGRAF*BUTD 79.696.298.5 63.290.796.1
ACMMBUTD 81.998.099.3 58.287.393.9
ACMM*BUTD 84.197.899.4 60.788.794.9
SAN^VggNet 74.994.998.2 60.890.395.7
SAN^ResNet 85.497.599.0 69.193.497.2

Performance of MSCOCO5K

Method_name Concise_note Sentence retrieval Image retrieval
R@1R@5R@10 R@1R@5R@10
DVSARCNN 16.539.252.0 10.729.642.2
FVGMM+HGLMM 17.339.050.2 10.828.340.1
Order-embVggNet 23.3--65.0 18.0--57.6
CSEResNet 27.957.170.4 22.250.264.4
CMPLMobileNet 24.652.366.4 19.144.658.4
CMPMResNet 31.160.773.9 22.950.263.8
TERNBUTD, Bert 38.469.581.3 28.759.772.7
Dual-PathVggNet 35.563.275.6 21.047.560.9
Dual-PathResNet 41.270.581.1 25.353.466.4
VSE++VggNet 32.961.774.7 24.152.866.2
VSE++ResNet 41.371.181.2 30.359.472.4
GXNResNet 42.0--84.7 31.7--74.6
SCOVggNet 40.270.181.3 31.361.573.9
SCOResNet 42.872.383.0 33.162.975.5
CVSE++ResNet 43.273.584.1 32.462.274.6
PVSEResNet 45.274.384.5 32.463.075.0
R-SCANBUTD, VrR-VG 45.477.987.9 36.265.576.7
SAVEResNet 46.776.386.1 34.064.877.0
MPLSCAN_i2t 46.977.787.6 34.464.275.9
GVSE*BUTD 47.276.688.4 31.261.270.5
CASCResNet 47.278.387.4 34.764.876.8
OANBUTD 47.881.290.4 37.066.678.0
MTFNBUTD 44.776.487.3 33.164.776.1
MTFNBUTD, RR 48.377.687.3 35.966.176.1
A3VSEBUTD 49.381.190.2 39.068.080.1
GVSE*BUTD 49.977.487.6 38.468.579.7
SGMBUTD 50.079.387.9 35.364.976.5
CAMPBUTD 50.182.189.7 39.068.980.2
SCANBUTD, i2t_LSE 46.477.487.2 34.463.775.7
SCAN*BUTD, AVE+LSE 50.482.290.0 38.669.380.4
GOTSCAN_i2t 50.580.289.8 38.166.878.5
PFAN*BUTD 50.883.989.1 39.569.580.8
PFAN++*BUTD 51.284.389.2 41.470.979.0
HOADBUTD 51.281.789.1 39.472.584.1
HOADBUTD, +Dist 51.481.889.1 40.573.584.1
CAANBUTD 52.583.390.9 41.270.382.9
VSRN*BUTD 53.081.189.4 40.570.681.1
IMRAMBUTD, Image 53.282.590.4 38.968.579.2
IMRAMBUTD, Text 52.081.890.1 38.668.179.1
IMRAMBUTD, Full 53.783.291.0 39.769.179.8
MMCABUTD, Bert 54.082.590.7 38.769.780.8
DSRANBUTD, GRU 51.981.689.8 39.570.681.0
DSRANBUTD, Bert 53.782.189.9 40.370.981.3
DSRAN*BUTD, GRU 54.483.591.3 41.571.982.1
DSRAN*BUTD, Bert 55.383.590.9 41.772.782.8
TERANBUTD, Bert 55.683.991.6 42.672.582.9
SCGVggNet, Prod 49.978.988.1 33.262.474.7
SCGVggNet, Gated 56.684.592.0 39.268.081.3
AOQ*BUTD, SCAN 51.282.590.1 39.469.780.4
AOQ*BUTD, VSRN 55.183.390.8 41.171.582.0
AOQ*BUTD, BFAN 57.384.591.7 40.169.280.1
ADDR*BUTD, BFAN 54.384.091.5 40.169.280.6
ADDR*BUTD, VSRN 56.685.390.4 42.571.982.0
ADDR*BUTD, SCAN 57.386.092.7 41.872.081.3
SGRAFBUTD, SAF 53.382.390.1 39.869.080.2
SGRAFBUTD, SGR 56.983.290.5 40.269.079.8
SGRAF*BUTD 57.884.991.6 41.970.781.3
SAN^ResNet 65.489.494.8 46.277.486.6
ACMMBUTD 63.588.093.6 36.765.176.7
ACMM*BUTD 66.989.694.9 39.569.681.1

Peformance of Identity-aware Datasets

Performance of CUHK-PEDES

Method_name Concise_note Text-to-Image
R@1R@5R@10
LSTM-Q+IVggNet 17.19--57.82
GNA-RNNVggNet 19.05--53.64
IATVVggNet 25.94--60.48
PWM-ATHVggNet 27.1449.4561.02
GLAResNet 43.5866.9376.26
Dual-PathVggNet 32.1554.4264.30
Dual-PathResNet 44.4066.2675.07
CMPMMobileNet 44.02--77.00
CMPLMobileNet 49.37--79.27
PMAVggNet 47.0268.5478.06
PMAResNet 53.8173.5481.23
TIMAMResNet, Bert 54.5177.5684.78

Performance of CUB-Flowers

Method_name Concise_note CUB Flowers
Image-to-Text Text-to-Image Image-to-Text Text-to-Image
R@1AP@50 R@1AP@50
FVGMM+HGLMM 36.535.6 54.852.8
Word2Vec 38.633.5 54.252.1
Word-NNCNN 51.043.3 60.756.3
Word-NNCNN-RNN 56.848.7 65.659.6
IATVTriplet 52.552.4 64.364.9
IATVVggNet 61.557.6 68.470.1
CMPMMobileNet 62.164.6 66.167.7
CMPLMobileNet 64.367.9 68.969.7
TIMAMResNet, Bert 67.770.3 70.673.7