Skip to content

Why the embedding layer instead of the one-hot audio vector? #20

@ivancarapinha

Description

@ivancarapinha

Hello,

In the original implementation of this model, the authors employed a one-hot audio vector of dimension 1024. Unfortunately, the authors did not detail much about this one-hot vector in the paper and did not explain its purpose in the model. Given that its dimension is 1024 = (2^10), and that authors use 10-bit audio samples, I assume this vector is related to the prediction of each bit in each audio sample. But that's just a guess.

So, I have two (actually three) questions:

  1. What is the purpose of the one-hot audio vector in the original implementation?
  2. Why did you replace the one-hot vector with an embedding layer? What changed in the model behavior with this replacement?

Thank you very much

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions