Compact summary from Generative AI 2 Agent Based Automation

Sequence-to-sequence processing is machine learning in which a chain of information e.g., words is translated into another e.g., for language translation, a chat bot and image generation.

Suppose you tell the computer the sentence, “When was Christopher Columbus born?” The model gets such inputs and takes it as an example providing a “1451” as its reply. During the training, the model just learned which output sequence best match the input sequence.

These architectures are typically referred to as encoder-decoder networks. An input sequence is transformed into an abstract intermediate representation (the latent space, encoding) and then translated back into a target form (decoding).

Prominent examples include:  

  • Autoencoders (AE): These primarily act as filters or even for compression most often used in the form of filters and compression or other applications.  
  • Variational Autoencoders (VAE): These already belong to Generative AI, since they allow variation in output by modeling distribution functions.  

A recurring challenge with such a feature is different sequence lengths for some of these networks is the ‘information bottleneck’ – a condition with different sequence lengths leading to an “information bottleneck” which can cause the loss of context information information e.g.m that “Dirk” is the “name” in the sentence “My name is Dirk”.

From this came the development of Transformers models, where (self-)attention mechanism allow the “full context” linkage – every token with all others.

This can be visualized as an evaluation matrix: In the sentence “My name is Dirk,” the model knows, thanks to the matrix, that there is a much stronger semantic connection between “name” and “Dirk” than “name” and “is”. 

When solving simple tasks such as “1000 + 451,” we make the AI sound clever. But when it comes to more complex formulas, such as “1000 + 451 / 2”, purely language-based models often come up short because they haven’t memorized each conceivable number combination.

The answer is an “agentic” workflow with

  • tools and
  • memory.

A calculator tool then performs the exact calculation – a system called function as the intent of the users request. Via tools and memory, the agent becomes the “brain” that can respond by understanding when a tool to be used is required (sensing), entering the provided output (actuator), and returning the response to the user.

A contemporary AI architecture approach are State Space Models (SSMs). They essentially describe tasks with a lower number of variables, in that they shrink the solution space and make performance easier to measure. They should be considered special working memory that can serve as a sub-module for logic tasks or serve as the foundation of entire topologies, such as Mamba. The key difference between the two and the classical models, is, in this case, that the intermediate situation is never modeled as a static matrix, but a mathematical function. This obscures more recognizable architectures:

  • RNNs: Like Recurrent Neural Networks, SSMs are based on past information.  
  • CNNs: Your data can even be mathematically modeled as convolution; basically a compression to the “essentials”.  

The consideration of this state in a continuous (the entire time) or a discrete (at specified points in time) framework depends on how you will implement it. Hardware-aware optimization is another critical factor for SSMs: You eliminate performance bottlenecks (as in NUMA architectures) by choosing fast SRAM for the processing of fast cores (e.g., convolutions) and DRAM for large-scale data. 

As the task becomes harder, the more “thinking” (planning) is needed. A prominent concept such as Chain of Thought (CoT). With e.g., CoT the AI begins by jotting down its ideas, sort of like a student who jotting down notes on a math problem.

Such an AI autonomy may be classified in three ways:  

  • Low: The AI answers back (a shot).  
  • Medium: The AI is in a fixed sequence (like a plan), for instance, searching for information in a database (vector search) or checking a calculator.  
  • High: AI works out plans by itself. For instance, it understands that it first needs to do some computations, but then format the text especially well to fully complete the task.  

At the possible fourth level of autonomy the “memory” comes into the game. Like humans, we now distinguish between two functions:

  • Session-based memory offers a personal touch to conversations. It remembers who you are, how you prefer to be addressed, but this information is lost if the program goes to sleep.
  • The vector database, or long-term memory, is on the other hand a vast archive that provides knowledge permanently.

The interaction between the archive and the answer is called RAG (Retrieval-Augmented Generation).

The process is simple:  

  • Search: The AI gets exact chunks of information in the archive that answer a particular question.  
  • Answer: The AI can only source such findings for a well-founded response.  

Personalization and intelligence attributed to an AI is due to the interaction of various data sources and behavioral rules. They can be decomposed into the following levels:  

  • Contextual knowledge: Linking to specialized databases (e.g., for mathematics and foreign languages) to provide expertise.  
  • System prompt: Definition of behavioral pattern and guidelines (e.g., tone of voice, politeness, safety standards).  
  • User profile: Store user preferences, (e.g., culinary preferences) so responses are personalized.  
  • Conversation history: Adding past conversations to create a logical discussion.  

The standardization of these interfaces of models, storage and tools is increasingly being solved via the Model Context Protocol (MCP).

This protocol consists of three core roles:  

  • Host: The environment running in the program (e.g. an app or IDE plugin) which decides what resources are needed.  
  • Client: The gateway on the host that connects to the server and takes requests from the server.  
  • Server: Actual provider of services or data (e.g., a calculator service, a database connection).  

Once all of afore mentioned building blocks are present, intelligent strategies emerge that AI utilizes to solve problems, e.g., what I expect since years from my smart speakers at home or my vehicles voice command interface.

You might describe these as AI’s “thought patterns” such as:  

  • ReAct: The AI thinks, acts (e.g., invokes a tool), evaluates the result, and keeps thinking. It’s an ongoing trade-off between logic and action.  
  • Self-Refine: AI not satisfied with first draft. This rereads the answer that it comes up with critically to be a perfect one.  
  • Constitutional AI: AI is given ethical guidelines. It ensures that in places the AI receives moral guidelines. Instead of humans having to go through each and every reply in a manual review, the “digital constitution” (the rules) becomes an internal guiding light so that no offense or error can occur in the first place.  

What we see is a change from one-off nudges to a full workflow engineering ecosystem. Just like intent analysis is a particular form of pattern recognition in itself, workflows are creating reusable blueprints for AI-driven processes. This shift in direction gives low-code/no-code environments even greater flexibility and power than ever before.


This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *