Skip to content

NMT models speedup abnormally related to batch size #106

@dearchill

Description

@dearchill

Hi, Thanks for the great work. I just tested the fairseq-generate in my test set(ZH-EN translation) using the FastSeq and Fairseq, and the speedup is quiet abnormal comparing with the example link.
My test set has 1526 sentences with 5~150 Chinese characters each, and my experiment is on NVIDIA Tesla T4. The translation model I used is base transformer arch in fairseq, with encoder layer nums equals to 30.
I tested with following command:
for fairseq, fairseq-generate ../data-bin --path model_avg.pt --remove-bpe --batch-size 128
for fastseq, fastseq-generate-for-fairseq ../data-bin --path model_avg.pt --remove-bpe --batch-size 128 --postprocess-workers 5
I didn't use the --no-repeat-ngram-size in fastseq, and the beam size is default 5, lenpen is 1.
My test result is as follows:

BatchSize not assigned 128 10 5 1
fairseq-0.10.2 65.79 sentences/s 63.18 sentences/s 19.06 sentences/s 11.79 sentences/s 3.06 sentences/s
above + fastseq 75.55 sentences/s 74.28 sentences/s 17.38 sentences/s 11.47 sentences/s 2.92 sentences/s

I found when the batch size is large(such as 128 and above), the fastseq has obvious speedup(but not as much as 2x or above), but when the batch size is small( I test this because of my need for model used in actual situation for deployment), the fastseq seems like behaving no speedup at all, and even slower. I think the phenomenon quiet abnormal and ask for your help. Looking for your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions