ChatGPT(-3) behind the scenes

One of the most promising successor of the Google search is IMHO the OpenAI ChatGPT3. Technically it is a language model (external link) with 175 billion parameter (source here). Wow, the amount of parameters of a neural network are the weights of the connections. For example a linear regression has one parameter (exclude “b”):

y = m * x + b

The variable m is also called slope, weight or gradient = △y / △ x.

For the linear components of a neural network it is the connection between the neurons i.e. like the set of all combined linear components e.g., c = 3a + 2b is

  • 3 times “a” and
  • 2 times “b”.

GPT has an core concept supporting its performance: the (T)ransformer (external link). The referenced Wikipedia article already points out what transformer are (for). Transformer replace the RNN (or even LSTM) sequence-2-sequence (aka sentence2-sentence) model and allow parallelized encoding7 decoding of time ordered input vector e.g. sentences of words.

Further more, transformer can be seen as a nice way of building “ML components” as known in software development; we can have multiple of them running in parallel with different “transformation” purpose.

How do transformer work? Generally it is about data encoding; computer do not know words, images, etc. They only know 0 and 1 in long strings we call i.e. vector. For language processing we

1) have to transform sentences to numeric vectors.

Example: Also compare the following blog post here.

  • Sentence 1: I eat fish
  • Sentence 2: I eat beef

Derived vector space with one hot encoding (1 if the word is in a sentence, 0 if the word is not in the sentence). The meta vector space is (I, eat, fish, beef). 

  • Sentence 1: (1,1,1,0)
  • Sentence 2: (1,1,0,1)

This is also called a embedding/ embedded space.

2) have to give so called specific “attention” to it. Sounds sophisticated but is just a mathematical trick to give more weight on specific vector positions:

  • Sentence 1: (1,1,1,0) x (0.2, 1, 1, 1)
  • Sentence 2: (1,1,0,1) x (0.2, 1, 1, 1)

… if it is less important who eats the fish or beef.

For translating e.g., English to French we have to do these 2 steps for both sequences

  1. one sequence for English
  2. another sequence for French

The feed forward neural network in between just learns how to convert one sequence to another; English to French, Question to Answer (ChatGPT case).

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *