From transformer networks and decision making methods

Multi-attribute decision-making (MADM) methods are essential tools for evaluating, ranking, and selecting options based on various criteria. Surprisingly, many of us use these techniques intuitively, without even realizing the complex math behind them. Imagine you have three alternatives and three criteria to consider. The simplest and most popular approach is to rate each criterion for every alternative.

Criterion 1Criterion 2Criterion 3
Alternative 1FastA1
Alternative 2SlowB3
Alternative 3FastA1

These alternatives are tricky to compare since they mix nominal (A, B) and categorical (Fast, Slow) variables. Categorical variables follow a specific order. Enter encoding procedures: they transform these variables into numerical values, making comparison a breeze.

Criterion 1Criterion 2Criterion 3
Alternative 1211
Alternative 2123
Alternative 3211

Next, we need to decide if we’re dealing with cost or benefit attributes. For benefits, more is always better. Think of it this way: “Fast” is more desirable, while “A” is less favored compared to “B.” Beyond simple nominal and categorical variables, we have complex ones like cyclical features (e.g., months of the year, hours of the day), which we encode using sine and cosine transformations to capture their repeating patterns.

To compare values effectively, normalization is key. Without it, criteria with larger ranges can distort the results. A popular method for this is linear scale transformation:


Next, a weight is introduced for each criterion.

Criterion 1Weight 1Criterion 2Weight 2Criterion 3Weight 3
Alternative 1211213
Alternative 2112233
Alternative 3211213

To avoid imbalance, a probability distribution (0 to 1) with a sum of 1 is used for all criteria weights.

Criterion 1Weight 1Criterion 2Weight 2Criterion 3Weight 3
Alternative 120.110.310.6
Alternative 210.120.330.6
Alternative 320.110.310.6

Summarizing the weighted values for each alternative helps in making a selection. For example, for Alternative 1:

In transformer networks, this resembles the additive (Bahdanau) attention mechanism, where attention scores are multiplied by attention weights. In comparison, the multiplicative (dot-product) attention mechanism introduced by Luong et al. calculates:

Many other MADM approaches are feasible. For example, consider prediction in language translation, where criteria are hidden states of already seen words and alternatives are words in the vocabulary.

Word no. 1 of input sequence Attention Score x Softmax
Word 1 of target vacubular21
Word 2 of target vacubular11
Word 3 of target vacubular21

The result is the so-called context vector. Unlike former RNN sequence-to-sequence approaches, the attention mechanism considers all already seen input words, also known as hidden states.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *