No gradients for those params during predicting aligned position

In attention.py, during predicting aligned position for local attention, 

`

      with vs.variable_scope("WindowPrediction", initializer=initializer):
         ht = cells.linear([decoder_hidden_state], attention_vec_size, True)

        # get the parameters (vp)
        vp = vs.get_variable("AttnVp_%d" % 0, [attention_vec_size], initializer=initializer)

        # tanh(Wp*ht)
        tanh = math_ops.tanh(ht)
        # S * sigmoid(vp * tanh(Wp*ht))  - this is going to return a number
        # for each sentence in the batch - i.e., a tensor of shape batch x 1
        S = attn_length
        pt = math_ops.reduce_sum((vp * tanh), [2, 3])
        pt = math_ops.sigmoid(pt) * S

        # now we get only the integer part of the values
        pt = tf.floor(pt)

        _ = tf.histogram_summary('local_window_predictions', pt)

        # we now create a tensor containing the indices representing each position
        # of the sentence - i.e., if the sentence contain 5 tokens and batch_size is 3,
        # the resulting tensor will be:
        # [[0, 1, 2, 3, 4]
        #  [0, 1, 2, 3, 4]
        #  [0, 1, 2, 3, 4]]
        #
        indices = []
        for pos in xrange(attn_length):
            indices.append(pos)
        indices = indices * batch_size
        idx = tf.convert_to_tensor(tf.to_float(indices), dtype=dtype)
        idx = tf.reshape(idx, [-1, attn_length])

        # here we calculate the boundaries of the attention window based on the ppositions
        low = pt - window_size + 1  # we add one because the floor op already generates the first position
        high = pt + window_size

        # here we check our positions against the boundaries
        mlow = tf.to_float(idx < low)
        mhigh = tf.to_float(idx > high)

        # now we combine both into a pre-mask that has 0s and 1s switched
        # i.e, at this point, True == 0 and False == 1
        m = mlow + mhigh  # batch_size

        # here we switch the 0s to 1s and the 1s to 0s
        # we correct the values so True == 1 and False == 0
        mask = tf.to_float(tf.equal(m, 0.0))

        # here we switch off all the values that fall outside the window
        # first we switch off those in the truncated normal
        alpha = s * mask
        masked_soft = nn_ops.softmax(alpha)
`
When i'am trying run these code, those params in scope 'WindowPrediction' don't have gradients. According to tensorflow source code, comparison operations have no gradients. Maybe we need to find another way to generate mask from predicted position.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No gradients for those params during predicting aligned position #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

No gradients for those params during predicting aligned position #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions