Why the embedding layer instead of the one-hot audio vector?

Hello,

In the original implementation of this model, the authors employed a one-hot audio vector of dimension 1024. Unfortunately, the authors did not detail much about this one-hot vector in the paper and did not explain its purpose in the model. Given that its dimension is 1024 = (2^10), and that authors use 10-bit audio samples, I assume this vector is related to the prediction of each bit in each audio sample. But that's just a guess. 

So, I have two (actually three) questions:

1) What is the purpose of the one-hot audio vector in the original implementation?
2) Why did you replace the one-hot vector with an embedding layer? What changed in the model behavior with this replacement?

Thank you very much 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why the embedding layer instead of the one-hot audio vector? #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why the embedding layer instead of the one-hot audio vector? #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions