-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
In attention.py, during predicting aligned position for local attention,
`
with vs.variable_scope("WindowPrediction", initializer=initializer):
ht = cells.linear([decoder_hidden_state], attention_vec_size, True)
# get the parameters (vp)
vp = vs.get_variable("AttnVp_%d" % 0, [attention_vec_size], initializer=initializer)
# tanh(Wp*ht)
tanh = math_ops.tanh(ht)
# S * sigmoid(vp * tanh(Wp*ht)) - this is going to return a number
# for each sentence in the batch - i.e., a tensor of shape batch x 1
S = attn_length
pt = math_ops.reduce_sum((vp * tanh), [2, 3])
pt = math_ops.sigmoid(pt) * S
# now we get only the integer part of the values
pt = tf.floor(pt)
_ = tf.histogram_summary('local_window_predictions', pt)
# we now create a tensor containing the indices representing each position
# of the sentence - i.e., if the sentence contain 5 tokens and batch_size is 3,
# the resulting tensor will be:
# [[0, 1, 2, 3, 4]
# [0, 1, 2, 3, 4]
# [0, 1, 2, 3, 4]]
#
indices = []
for pos in xrange(attn_length):
indices.append(pos)
indices = indices * batch_size
idx = tf.convert_to_tensor(tf.to_float(indices), dtype=dtype)
idx = tf.reshape(idx, [-1, attn_length])
# here we calculate the boundaries of the attention window based on the ppositions
low = pt - window_size + 1 # we add one because the floor op already generates the first position
high = pt + window_size
# here we check our positions against the boundaries
mlow = tf.to_float(idx < low)
mhigh = tf.to_float(idx > high)
# now we combine both into a pre-mask that has 0s and 1s switched
# i.e, at this point, True == 0 and False == 1
m = mlow + mhigh # batch_size
# here we switch the 0s to 1s and the 1s to 0s
# we correct the values so True == 1 and False == 0
mask = tf.to_float(tf.equal(m, 0.0))
# here we switch off all the values that fall outside the window
# first we switch off those in the truncated normal
alpha = s * mask
masked_soft = nn_ops.softmax(alpha)
`
When i'am trying run these code, those params in scope 'WindowPrediction' don't have gradients. According to tensorflow source code, comparison operations have no gradients. Maybe we need to find another way to generate mask from predicted position.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels