Skip to content

No gradients for those params during predicting aligned position #2

@PES2g

Description

@PES2g

In attention.py, during predicting aligned position for local attention,

`

  with vs.variable_scope("WindowPrediction", initializer=initializer):
     ht = cells.linear([decoder_hidden_state], attention_vec_size, True)

    # get the parameters (vp)
    vp = vs.get_variable("AttnVp_%d" % 0, [attention_vec_size], initializer=initializer)

    # tanh(Wp*ht)
    tanh = math_ops.tanh(ht)
    # S * sigmoid(vp * tanh(Wp*ht))  - this is going to return a number
    # for each sentence in the batch - i.e., a tensor of shape batch x 1
    S = attn_length
    pt = math_ops.reduce_sum((vp * tanh), [2, 3])
    pt = math_ops.sigmoid(pt) * S

    # now we get only the integer part of the values
    pt = tf.floor(pt)

    _ = tf.histogram_summary('local_window_predictions', pt)

    # we now create a tensor containing the indices representing each position
    # of the sentence - i.e., if the sentence contain 5 tokens and batch_size is 3,
    # the resulting tensor will be:
    # [[0, 1, 2, 3, 4]
    #  [0, 1, 2, 3, 4]
    #  [0, 1, 2, 3, 4]]
    #
    indices = []
    for pos in xrange(attn_length):
        indices.append(pos)
    indices = indices * batch_size
    idx = tf.convert_to_tensor(tf.to_float(indices), dtype=dtype)
    idx = tf.reshape(idx, [-1, attn_length])

    # here we calculate the boundaries of the attention window based on the ppositions
    low = pt - window_size + 1  # we add one because the floor op already generates the first position
    high = pt + window_size

    # here we check our positions against the boundaries
    mlow = tf.to_float(idx < low)
    mhigh = tf.to_float(idx > high)

    # now we combine both into a pre-mask that has 0s and 1s switched
    # i.e, at this point, True == 0 and False == 1
    m = mlow + mhigh  # batch_size

    # here we switch the 0s to 1s and the 1s to 0s
    # we correct the values so True == 1 and False == 0
    mask = tf.to_float(tf.equal(m, 0.0))

    # here we switch off all the values that fall outside the window
    # first we switch off those in the truncated normal
    alpha = s * mask
    masked_soft = nn_ops.softmax(alpha)

`
When i'am trying run these code, those params in scope 'WindowPrediction' don't have gradients. According to tensorflow source code, comparison operations have no gradients. Maybe we need to find another way to generate mask from predicted position.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions