Generative ML Transformer Attention

Let’s apply the attention mechanism to translate the German sentence “ich liebe dich”.

Suppose we have the following numerical representations for the words e.g. after a worr2vec embedding.

German Sentence (numerical representation):
“ich” -> [0.2, 0.4, 0.6]
“liebe” -> [0.8, 0.3, 0.1]
“dich” -> [0.7, 0.9, 0.5]

English Translation (numerical representation):
“I” -> [0.1, 0.2, 0.3]
“love” -> [0.4, 0.5, 0.6]
“you” -> [0.7, 0.8, 0.9]

Now, let’s focus on translating the word “I” in the English sentence.

  1. Attention Scores Calculation:
    The attention mechanism computes the similarity scores between the word “I” and each word in the source (German) sentence.

For example, the attention scores might be calculated as follows:

  • Attention Score between “I” and “ich”: 0.35
  • Attention Score between “I” and “liebe”: 0.65
  • Attention Score between “I” and “dich”: 0.25
  1. Attention Weights:
    The attention scores are normalized to obtain attention weights.

After normalization:

  • Attention Weight for “ich”: 0.29
  • Attention Weight for “liebe”: 0.54
  • Attention Weight for “dich”: 0.17
  1. Context Vector:
    The context vector is calculated by taking a weighted sum of the source sentence embeddings (German sentence embeddings) using the attention weights.

Context Vector for “I” = (0.29 * [0.2, 0.4, 0.6]) + (0.54 * [0.8, 0.3, 0.1]) + (0.17 * [0.7, 0.9, 0.5]) = [0.671, 0.404, 0.286]

The context vector [0.671, 0.404, 0.286] is then used along with the decoder’s state to generate the word “I” in the English translation.

This process is repeated for each word in the target sentence to generate the complete translation “I love you.” The attention mechanism allows the model to focus on relevant parts of the source sentence (German) while generating each word in the target sentence (English).

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *