Skip to content

Self-Attention Mechanism is Incredibly Inaccurate #8

@DragonflyRobotics

Description

@DragonflyRobotics

Description
As of now, the Self-Attention Mechanism is using a simple Cosine Similarity/Dot Product Similarity Algorithm with a selectivity threshold. This is proven to be very inaccurate and cannot be reverse searched as the incorrect selections will invalidate the integrity of the database.

To Reproduce
To reproduce the behavior:

from MAGIST.NLP.SelfAttention import TextPreprocessing

t = TextPreprocessing("config.json")

out = t.__call__("Hello, my name is John. I am a dummy script.")

for i in out:
    print(i)

Output

[5.967583262815608, 'hello', 'Good']
[3.7432159225461947, 'my', 'Not']
[2.520566459677965, 'name', 'Not']           ---> Incorrect; This should be "Good"
[5.6983463875519735, 'is', 'Not']
[4.848795399908668, 'john', 'Not']
[6.083478457022617, 'i', 'Good']
[9.443521265161667, 'am', 'Good']
[8.284217064260607, 'a', 'Good']             ---> Incorrect; This should be "Not"
[8.485852410408823, 'dummy', 'Good']
[2.466104715281189, 'script', 'Not']         ---> Incorrect; This should be "Good"

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
This was expected since this algorithm is very primitive. Perhaps, a better positional embedding or an end-to-end LSTM-Dense neural network would improve its performance.

Metadata

Metadata

Labels

bugSomething isn't workingenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions