AI Pattern Catalogue

Table of Contents

Data Representation Design Patterns

Hashed Feature

Maps high-cardinality categorical data to a fixed-size vector using a hash function to reduce dimensionality and control memory use.

Examples:

  • Hash millions of unique user IDs into a 1024-dimension vector for ad click prediction.
  • Hash website domain names to 2048-dimension features for phishing detection.
  • Hash product_SKU values for inventory prediction without storing huge vocabularies.

Code Example:

https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html

Embedding

Maps categorical data to a dense continuous space where semantically similar items are close together.

Examples:

  • Learn embeddings for words in a corpus so similar words like ‘dog’ and ‘puppy’ are close together.
  • Product embeddings learned from co-purchases on an e-commerce site.
  • User embeddings from social graph connections for friend recommendations.

Code Example: 

https://keras.io/api/layers/embeddings/embedding

Feature Cross

Creates synthetic features representing interactions between two or more original features.

Examples:

  • Cross city and day_of_week for ride demand prediction.
  • Cross browser and device_type for click-through rate models.
  • Cross soil_type and rainfall_level for crop yield estimation.

Code Example: 

https://www.tensorflow.org/tutorials/structured_data/feature_columns

Multimodal Input

Integrates data from different modalities (e.g., image + text + numeric) in a model architecture, usually via parallel processing paths.

Examples:

  • Use property photos, text descriptions, and price history to predict house prices.
  • Combine MRI scans (image) and medical notes (text) for diagnosis.
  • Process video frames + audio + title text for content moderation.

Code Example: 

https://pytorch.org/tutorials/intermediate/multimodal_tutorial.html

Data Augmentation

Increases dataset size by applying transformations to existing data, improving model robustness.

Examples:

  • Rotate, flip, and brighten cat images to train a cat classifier.
  • Add background noise to speech recordings for ASR training.
  • Back-translate text sentences to create paraphrase variants.

Code Example: 

https://albumentations.ai/docs

Synthetic Data Generation

Creates artificial data that mimics real data properties, useful where data is rare, expensive, or privacy-sensitive.

Examples:

  • Generate synthetic credit card fraud examples using GANs.
  • Simulate self-driving car scenarios with synthetic video.
  • Create synthetic patient records that match statistical properties of real data.

Code Example: 

https://sdv.dev/SDV/user_guides/single_table/ctgan.html

Graph Representation

Encodes relational data (nodes + edges) for models like GNNs, capturing interactions.

Examples:

  • Model social network connections for friend suggestions.
  • Represent transactions as graph edges for fraud detection.
  • Encode molecules as atom-bond graphs for drug discovery.

Code Example:

https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html

Time Series Representation

Represents sequential data chronologically, often using sliding windows, sequence models, or temporal features.

Examples:

  • Use a 72-hour window of electricity consumption for load forecasting.
  • Sequence recent transactions to predict fraud.
  • Analyze machine sensor data over time for predictive maintenance.

Code Example: 

https://www.tensorflow.org/tutorials/structured_data/time_series

Geospatial Representation

Encodes location data into features models can learn from, e.g., using geohashes or spatial grids.

Examples:

  • Divide a city into geohashes for ride-hailing demand prediction.
  • Encode proximity to historical wildfire zones for insurance risk models.
  • Use S2 cells to model property price variation.

Code Example: 

https://pypi.org/project/Geohash

Discretization (Binning)

Converts continuous numerical variables into discrete categories (bins), aiding robustness and non-linear modeling.

Examples:

  • Bin age into ranges (e.g., 0-17, 18-35) for churn models.
  • Income binned into quartiles for credit scoring.
  • Time on site binned into short, medium, long engagement.

Code Example: 

https://pandas.pydata.org/docs/reference/api/pandas.cut.html

Feature Scaling

Normalizes or standardizes numerical features to improve model convergence and balance feature contributions.

Examples:

  • Scale house_price and bedroom_count so neither dominates.
  • Normalize features before PCA to prevent bias.
  • Scale distance features for KNN.

Code Example: 

https://scikit-learn.org/stable/modules/preprocessing.html

Tokenization

Splits text into smaller units (tokens) like words, subwords, or characters for NLP.

Examples:

  • Split sentences into words for text classification.
  • Use BPE to tokenize rare words as subword units.
  • Character tokenization for misspelling-robust models.

Code Example:

https://huggingface.co/docs/tokenizers/index

Out-of-Vocabulary Handling

Defines how the model deals with tokens not seen during training, often using an <UNK> token.

Examples:

  • Map unseen words to <UNK> in sentiment analysis.
  • Use subword units to break OOV words into known pieces.
  • Apply character CNNs on OOV tokens.

Code Example: 

https://www.tensorflow.org/text/guide/subwords_tokenizer

Positional Encoding

Adds position information to input embeddings so sequence models (e.g., Transformers) can capture order.

Examples:

  • Inject token positions in Transformers for translation.
  • Learn positional embeddings for BERT models.
  • Use sinusoidal encodings for fixed-length sequences.

Code Example: 

https://pytorch.org/tutorials/beginner/transformer_tutorial.html

 Problem Framing & Learning Objectives

Reframing

Reframing alters how a problem is posed, transforming it into a formulation that is easier or more stable for a machine learning model to solve.

Examples:

  • Predict stock price direction (up/down/flat) rather than precise price (classification instead of regression).
  • Reframe face detection as semantic segmentation (pixel-wise classification).
  • Predict age group instead of exact age for customer segmentation.

Code Example:

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

Multilabel

Allows instances to belong to multiple classes simultaneously by framing the task as independent binary classifications for each label.

Examples:

  • Tag news articles with topics like ‘Politics’, ‘Economy’, ‘Tech’.
  • Tag images with multiple objects (e.g., ‘dog’, ‘grass’, ‘sky’).
  • Assign products to multiple categories like ‘Electronics’, ‘Gaming’, ‘Portable’.

Code Example:

https://scikit-learn.org/stable/auto_examples/multioutput/plot_classifier_chain_yeast.html

One-vs-Rest Classification

Breaks multi-class classification into a series of binary classifiers, each one distinguishing one class from all others.

Examples:

  • Classify flowers among Iris species using OvR.
  • Classify handwritten digits (0-9) with 10 binary classifiers.
  • Identify cancer subtypes via OvR binary models.

Code Example: 

https://scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest

Ordinal Classification

Handles categories with a natural order but unknown distance between classes.

Examples:

  • Predict star ratings (1-5) for product reviews.
  • Predict patient condition (‘stable’, ‘concerning’, ‘critical’).
  • Customer satisfaction survey (‘dissatisfied’ to ‘very satisfied’).

Code Example: https://github.com/fabianp/ordinal_regression/blob/master/ordinal_regression.py

Ensemble

Combines predictions from multiple models to produce a more accurate or robust final prediction.

Examples:

  • Average outputs of logistic regression, random forest, and neural network for credit scoring.
  • Blend multiple CNN architectures for image classification.
  • Stack models for Kaggle competitions.

Code Example: 

https://scikit-learn.org/stable/modules/ensemble.html

Cascade

Uses multiple models in sequence: a fast/simple model filters easy cases, complex models handle the rest.

Examples:

  • Use motion detection → object detection → face recognition for surveillance.
  • Spam detection: keyword check → statistical model → deep learning model.
  • Screen patients: basic test → imaging → biopsy.

Code Example: 

https://docs.opencv.org/4.x/d7/d8b/tutorial_py_face_detection.html

Neutral Class

Adds a fallback class (‘unknown’, ‘requires review’) to handle uncertain predictions.

Examples:

  • A medical model outputs ‘needs human review’ for ambiguous cases.
  • A chatbot detects queries it can’t handle and responds accordingly.
  • A moderation system defers to a human on borderline content.

Code Example:

https://towardsdatascience.com/reject-option-in-classification-models-8970f54807f3

Rebalancing

Addresses class imbalance with resampling or cost-sensitive learning.

Examples:

  • Oversample rare fraud cases using SMOTE.
  • Penalize false negatives heavily in rare disease detection.
  • Randomly undersample majority class in churn prediction.

Code Example:

https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html

Anomaly Detection

Detects rare, unusual patterns that differ significantly from the majority of the data.

Examples:

  • Identify cyberattacks via network traffic anomalies.
  • Flag unusual transaction patterns for fraud.
  • Detects machine failure from sensor deviations.

Code Example: 

https://scikit-learn.org/stable/modules/outlier_detection.html

Ranking Problem

Focuses on ordering a set of items by relevance rather than predicting a label or value.

Examples:

  • Rank search results by relevance for a query.
  • Rank products for recommendation based on click likelihood.
  • Rank job candidates by predicted suitability.

Code Example:

https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html

 Core Model Components & Training Techniques

Residual Connection (ResNet)

Adds shortcut connections that skip one or more layers, allowing gradients to flow more easily and enabling very deep networks.

Examples:

  • ResNet-50 uses skip connections to train a 50-layer deep CNN for image classification.
  • U-Net employs skip connections between encoder and decoder paths to preserve spatial details.
  • Transformers apply residual connections after attention and feedforward layers.

Code Example: 

https://keras.io/api/applications/resnet

Dropout

Randomly drops units (and their connections) during training to prevent overfitting and promote redundancy.

Examples:

  • Apply 0.5 dropout to dense layers in a text classifier to improve generalization.
  • Use dropout on CNN layers for image recognition tasks.
  • Dropout in RNNs to reduce overfitting in language modeling.

Code Example: 

https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html

Batch Normalization

Normalizes inputs to each layer to stabilize and speed up training by reducing internal covariate shift.

Examples:

  • Apply batch norm after convolution layers in CNNs.
  • Use batch norm in GANs to stabilize training.
  • Apply batch norm to inputs of dense layers in tabular models.

Code Example: 

https://keras.io/api/layers/normalization_layers/batch_normalization

Learning Rate Scheduling

Dynamically adjusts learning rates during training to balance fast convergence and fine-tuning.

Examples:

  • Step decay reduces learning rate every 10 epochs by a factor of 0.1.
  • Cosine annealing gradually decreases the learning rate following a cosine curve.
  • Warmup schedule starts with small learning rates that increase gradually at the start.

Code Example:

https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Early Stopping

Stops training when validation performance stops improving to prevent overfitting.

Examples:

  • Stop training after 5 epochs of no validation loss improvement in image classification. 
  • Monitor validation AUC for an imbalanced classification task and stop when it plateaus.
  • Use early stopping on validation MAE in regression tasks.

Code Example: 

https://keras.io/api/callbacks/early_stopping

Gradient Clipping

Caps gradients during backpropagation to prevent exploding gradients and stabilize training.

Examples:

  • Clip gradients in LSTMs for long sequence modeling.
  • Apply gradient clipping in policy gradient reinforcement learning.
  • Clip gradients in deep CNNs without batch normalization.

Code Example: https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html

Weight Initialization

Initializes weights using strategies (e.g., Xavier, He) that ensure gradients propagate well at the start of training.

Examples:

  • Use He initialization for ReLU-activated networks.
  • Apply Xavier initialization for tanh-activated layers.
  • Initialize embedding layers from a normal distribution.

Code Example: 

https://pytorch.org/docs/stable/nn.init.html

Label Smoothing

Softens hard class labels to regularize the model and prevent overconfidence.

Examples:

  • Smooth labels for image classification to improve calibration.
  • Apply label smoothing in machine translation models for better generalization.
  • Use label smoothing for speech recognition tasks.

Code Example: https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy

Siamese Network

Uses two or more identical subnetworks to learn a similarity function between inputs.

Examples:

  • Signature verification by comparing embeddings of two signatures.
  • Face verification by learning similarity between two face images.
  • Plagiarism detection via document embedding comparison.

Code Example: 

https://keras.io/examples/vision/siamese_network

Two-Tower Model

Processes query and candidate (e.g., user and item) via separate subnetworks, learning embeddings to match pairs efficiently.

Examples:

  • Job recommendation: user resume tower + job description tower.
  • YouTube: user history tower + video feature tower.
  • Dating app: profile tower + potential match tower.

Code Example:

https://www.tensorflow.org/recommenders/examples/two_tower_retrieval

 Advanced Architectures

Meta-Learning

Meta-learning, or ‘learning to learn’, trains models across many tasks so they can quickly adapt to new tasks with minimal data.

Examples:

  • Train on multiple small image classification tasks to enable fast adaptation to a new class.
  • Use MAML to initialize weights for rapid fine-tuning on unseen tasks.
  • Meta-learn optimization strategies for few-shot learning.

Code Example: 

https://learn-to-learn.readthedocs.io

Noisy Student Training

A semi-supervised method where a student model is trained on pseudo-labeled data generated by a teacher, with noise added to improve generalization.

Examples:

  • Train a teacher on labeled data, label unlabeled data, and train a noisy student on both.
  • Apply heavy data augmentation to student inputs to improve robustness.
  • Iteratively replace teacher with student for self-training.

Code Example: 

https://github.com/google-research/noisystudent

Domain Adaptation

Transfers a model trained on one domain to work well on a different but related domain where labeled data is scarce.

Examples:

  • Adapt sentiment analysis from movie reviews to product reviews.
  • Transfer a spam filter trained on old emails to modern phishing emails.
  • Adapt medical imaging models between devices or institutions.

Code Example: 

https://github.com/jindongwang/transferlearning

Diffusion Models

A class of generative models that learn to reverse a gradual noising process to produce new data samples from noise.

Examples:

  • Generate realistic images from random noise (e.g. Stable Diffusion).
  • Inpaint missing image regions by denoising masked areas.
  • Synthesize audio by reversing noise to waveform.

Code Example: 

https://huggingface.co/docs/diffusers/index

Variational Autoencoder (VAE)

A generative model that learns probabilistic latent representations and reconstructs data from sampled points in the latent space.

Examples:

  • Generate novel handwritten digits by sampling latent space.
  • Interpolate between musical styles by navigating latent codes.
  • Detect anomalies via poor reconstruction of outliers.

Code Example: 

https://keras.io/examples/generative/vae

State Space Models (SSM)

Models sequences as continuous signals, efficiently handling long sequences while enabling parallel processing, as in architectures like Mamba.

Examples:

  • Process entire books as a single sequence for QA tasks.
  • Model DNA sequences of billions of base pairs for genomics.
  • Generate high-resolution audio by modeling long waveforms.

Code Example: 

https://github.com/state-spaces/mamba

 Deployment & MLOps Patterns

Stateless Serving Function

Processes each request independently without retaining memory of past requests, enabling horizontal scalability.

Examples:

  • Language translation API that handles each translation request separately.
  • Image classification service that processes independent image uploads.
  • Fraud detection API that scores transactions without context.

Code Example: 

https://cloud.google.com/functions/docs/concepts/overview

Stateful Serving Function

Maintains context between requests, useful for conversational AI or multi-step workflows.

Examples:

  • Chatbot that tracks conversation history to provide relevant answers.
  • Flight booking assistant that remembers selected dates and destinations.
  • Shopping assistant that tracks cart contents across interactions.

Code Example: 

https://rasa.com/docs/rasa

Batch Serving

Processes large numbers of inputs in one go rather than one at a time, improving efficiency for offline or scheduled tasks.

Examples:

  • Nightly fraud detection on all transactions of the day.
  • Daily recommendation generation for all users.
  • Batch scoring of credit applications.

Code Example: 

https://spark.apache.org/docs/latest/ml-guide.html

Continuous Model Evaluation

Monitors model performance metrics over time to detect data drift or model degradation.

Examples:

  • Track ETA model accuracy using actual trip completion times.
  • Monitor click-through rates of recommendation models.
  • Evaluate AUC of fraud detection model in production.

Code Example:

https://mlflow.org

Two-Phase Predictions

Combines a fast initial prediction with a slower, more accurate follow-up prediction.

Examples:

  • Fast heuristic search results followed by personalized re-ranking.
  • Quick rough weather forecast followed by precise simulation.
  • Fast sentiment label followed by detailed explanation generation.

Code Example:

https://aws.amazon.com/blogs/machine-learning/how-to-combine-fast-and-accurate-models/

Keyed Predictions

Associates requests with a versioned set of features to ensure consistency despite asynchronous processing.

Examples:

  • Fetch transaction snapshot by ID for fraud scoring.
  • Use profile version key for recommendations.
  • Query consistent feature view with timestamp key.

Code Example: 

https://feast.dev

Prediction Caching

Stores recent prediction results to serve repeated requests faster and reduce compute cost.

Examples:

  • Cache article summaries for popular front-page news.
  • Store translation results for common phrases.
  • Cache recommendations for active users.

Code Example: 

https://redis.io/docs/interact/data-types/cache

Circuit Breaker

Monitors for failures and prevents cascading outages by stopping requests to failing services temporarily.

Examples:

  • Trip breaker after multiple timeouts on model API.
  • Block model calls during upstream data outage.
  • Fail fast when external feature service is down.

Code Example: 

https://martinfowler.com/bliki/CircuitBreaker.html

Graceful Degradation / Fallback Model

Switches to a simpler or cached model when the primary model is unavailable.

Examples:

  • Use static rules for chatbot when LLM API is down.
  • Show cached predictions if live model is offline.

Code Example:

https://cloud.google.com/architecture/graceful-degradation-cloud-applications

Rate Limiting

Controls the frequency of API calls to prevent abuse and ensure fair usage.

Examples:

Limit free tier users to 10 API calls per minute.

Apply stricter limits to expensive model endpoints.

Provide tiered rate limits for different customer plans.

Code Example: 

https://konghq.com/blog/api-rate-limiting

Shadow Deployment

Runs a new model alongside the current production model on live data without impacting users.

Examples:

  • Evaluate a new fraud model silently on production transactions.
  • Test new recommendation algorithm against live traffic.
  • Run upgraded moderation model without affecting users.

Code Example:

https://mlflow.org/docs/latest/model-registry.html#shadow-deployment

Canary Deployment

Gradually rolls out a new model to a small subset of users to monitor performance before full deployment.

Examples:

  • Serve new search model to 1% of users initially.
  • Test new ad targeting model in a small region.
  • Deploy upgraded language model to a limited audience first.

Code Example: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#canary-deployments

A/B Testing

Splits traffic randomly between two or more model versions to compare performance metrics.

Examples:

  • Test two recommendation models for CTR differences.
  • Compare old and new ranking models for engagement.
  • Evaluate new chatbot model vs. existing one.

Code Example:

https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators

Multi-Armed Bandit

Dynamically allocates more traffic to better-performing models during testing, balancing exploration and exploitation.

Examples:

  • Allocate traffic between different ad creatives for maximized click rate.
  • Serve most effective news headline model to more users.
  • Route more traffic to best-performing recommendation engine.

Code Example: 

https://github.com/facebook/ax

Model Router

Routes requests to the most appropriate model based on request characteristics.

Examples:

  • Route customer support queries to specialized intent models.
  • Send different languages to appropriate translation models.
  • Dispatch image types to the right vision model.

Code Example: 

https://rayserve.readthedocs.io/en/latest/key-concepts.html#routers

Sidecar for Pre/Post-processing

Separates pre-processing, feature fetching, or post-processing into a co-located container or service.

Examples:

  • Sidecar fetches real-time features before passing to model.
  • Sidecar logs predictions and formats output.
  • Handle authentication in a sidecar before model API.

Code Example: 

https://istio.io/latest/docs/concepts/what-is-istio/#sidecar-proxy

 Hybrid Human-AI Systems

Human-in-the-Loop

Incorporates human feedback or intervention during model training or inference to handle edge cases, verify outputs, or improve performance.

Examples:

  • Route low-confidence predictions of a document parser to a human verifier.
  • Use human reviewers for ambiguous content flagged by moderation AI.
  • In self-driving, log human interventions for model retraining.

Code Example: 

https://labelbox.com/blog/what-is-human-in-the-loop

AI-Assisted Annotation

Uses AI to pre-label data which humans then verify or correct, speeding up the labeling process.

Examples:

  • Pre-draw bounding boxes for object detection tasks.
  • Highlight entities in text for human review.
  • Auto-segment medical images for radiologist correction.

Code Example: 

https://roboflow.com/annotate

AI Sourcing

AI breaks down tasks and routes them to appropriate human annotators or systems for efficient processing.

Examples:

  • AI divides report writing into sections and assigns them to different workers.
  • Moderation AI routes flagged content to specialists (e.g. graphic, hate speech).
  • Break up large translation tasks into smaller ones distributed to crowd workers.

Code Example: 

https://www.mturk.com

Generative Feedback Loop

Captures user interactions and failures in production to create new data for retraining and model improvement.

Examples:

  • Collect user corrections of chatbot answers for fine-tuning.
  • Use rejected code suggestions in coding assistant as hard examples.
  • Log poor image generation prompts for retraining image models.

Code Example: 

https://openai.com/research/learning-from-human-feedback

Active Learning

Selectively queries the most informative samples for human labeling, focusing effort where it will most improve the model.

Examples:

  • Select low-confidence samples from classifier for annotation.
  • Use uncertainty sampling in image labeling.
  • Query diverse data points where model ensembles disagree.

Code Example: 

https://modAL.readthedocs.io

Red Teaming

Proactively tests models with adversarial or creative inputs to find flaws, biases, and unsafe behaviors before deployment.

Examples:

  • Try to elicit harmful outputs from chatbots before launch.
  • Test vision models with adversarially perturbed images.
  • Find prompts that bypass safety filters in text-to-image models.

Code Example: 

https://github.com/EleutherAI/redteam

 Responsible AI & Governance

Heuristic Benchmark

Establishes a simple, non-ML baseline to compare complex models against, ensuring new solutions add value.

Examples:

  • Always recommend top 10 most purchased items in place of a recommendation model.
  • Flag all transactions over a certain amount as potential fraud for a fraud detection benchmark.
  • Predict churn if a customer hasn’t purchased in 90 days.

Code Example: 

https://scikit-learn.org/stable/modules/model_evaluation.html

Explainable Predictions

Uses tools like SHAP or LIME to provide human-readable reasons for a model’s predictions.

Examples:

  • Identify top features contributing to a denied loan application.
  • Show important factors in a disease risk prediction for doctors.
  • Highlight factors in churn predictions for customer success teams.

Code Example: 

https://shap.readthedocs.io

Counterfactual Explanations

Shows the minimal changes needed in input to change a model’s output, aiding transparency and recourse.

Examples:

  • Loan would be approved if annual income was $5,000 higher.
  • Job application passes screening if resume lists ‘Python’.
  • Marketing model predicts response if discount was 20% instead of 10%.

Code Example: 

https://github.com/amir-jafari/Counterfactual-Explanations

Fairness Lens

Audits and mitigates bias by evaluating model performance across different demographic groups.

Examples:

  • Check equal opportunity in loan approvals across races.
  • Audit recommendation fairness for gender balance.
  • Evaluate error rates of medical models across skin tones.

Code Example: 

https://aif360.readthedocs.io

Data Slicing Evaluation

Evaluates model metrics on subgroups to uncover hidden weaknesses or bias.

Examples:

  • Check cancer detection accuracy across skin tones.
  • Test self-driving model error rate at night vs day.
  • Evaluate click-through rate by country.

Code Example: 

https://tensorflow.org/tfx/guide/evaluator

Model as a Judge

Uses a strong LLM to evaluate or compare outputs from other models based on defined criteria.

Examples:

  • LLM compares chatbot replies for helpfulness.
  • Judge model scores summaries for factual accuracy.
  • Rank generated images for creativity.

Code Example: 

https://openai.com/research/gpt-4

Constitutional AI

Trains a model to align with predefined rules by critiquing and revising its own outputs autonomously.

Examples:

  • LLM critiques its output against a rule like ‘do not give medical advice’.
  • Rewrite harmful replies before training.
  • Enforce professional tone in chatbot replies.

Code Example: 

https://arxiv.org/abs/2212.08073

Model Card

Creates a standard, transparent document detailing a model’s intended use, performance, and limitations.

Examples:

  • Card notes facial recognition model works best on North American data.
  • LLM card details training data sources and biases.
  • Classifier card lists accuracy on different dialects.

Code Example: 

https://modelcards.readthedocs.io

Datasheets for Datasets

Documents a dataset’s origin, composition, and potential issues, ensuring transparency.

Examples:

  • Datasheet for medical dataset describes patient demographics and approvals.
  • Document dataset bias in celebrity images.
  • Detail content moderation policy for forum dataset.

Code Example: 

https://pair-code.github.io/datasheets-for-datasets

Model Assertions

Defines testable properties (e.g., monotonicity, invariance) for model validation before deployment.

Examples:

  • Assert credit score doesn’t drop if income increases.
  • Check translation round-trip similarity.
  • Ensure small image rotation doesn’t change classification.

Code Example: 

https://github.com/evidentlyai/evidently

 Reproducibility & Data Management

Transform Pattern

Encapsulates feature processing logic into portable components to ensure consistent transformations across training and serving.

Examples:

  • Normalize age feature with a single reusable function.
  • Use same tokenizer at training and inference.
  • Encapsulate date difference calculations.

Code Example: 

https://www.tensorflow.org/tfx/guide/transform

Repeatable Sampling

Ensures data splits and randomness are reproducible using fixed seeds or controlled sampling logic.

Examples:

  • Train/test split with a fixed random seed.
  • Initialize model weights with fixed seed.
  • Deterministic bagging in random forests.

Code Example:

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Bridged Schema

Provides compatibility between evolving data schemas and model expectations using a transformation layer.

Examples:

  • Convert deprecated features on the fly for older clients.
  • Map renamed features to expected model inputs.
  • Fill missing new features with defaults.

Code Example:

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language

Windowed Inference

Feeds the model with sliding windows of time series or sequence data for predictions.

Examples:

  • Pass last 60 min sensor data to predict failure.
  • Feed last 30 days of transactions to detect fraud.
  • Use sliding window on text tokens for LLM.

Code Example:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html

Workflow Pipeline

Automates and orchestrates the full ML lifecycle as a pipeline (e.g., training, validation, deployment).

Examples:

  • Kubeflow pipeline retrains model on new data.
  • Airflow job validates and deploys model daily.
  • CI/CD triggers model build on code commit.

Code Example: 

https://kubeflow.org/docs/components/pipelines

Feature Store

Central repository for managing, sharing, and serving features consistently across training and inference.

Examples:

  • Store embeddings for reuse across models.
  • Serve real-time features for fraud scoring.
  • Manage historical aggregates for training.

Code Example: 

https://feast.dev

Model Registry

Central system to version, manage, and govern models and their metadata.

Examples:

  • Track metrics and lineage of deployed models.
  • Enforce approval workflow before production.
  • Compare production model to candidate models.

Code Example: 

https://mlflow.org/docs/latest/model-registry.html

Prediction Log Analysis

Logs predictions and inputs for auditing, retraining, and debugging models.

Examples:

  • Log low-confidence predictions for review.
  • Track user feedback on predictions for retraining.
  • Analyze inputs that caused failures.

Code Example: 

https://cloud.google.com/logging/docs

Data Lineage

Tracks the origin and transformations of data to support debugging and compliance.

Examples:

  • Trace prediction back to raw data source.
  • Audit dataset for GDPR compliance.
  • Map feature transformations for reproducibility.

Code Example: 

https://www.databricks.com/glossary/data-lineage

Containerization

Packages models and dependencies in isolated, reproducible environments using containers.

Examples:

  • Deploy TensorFlow model in a Docker image.
  • Run model serving on Kubernetes cluster.
  • Share containerized dev environments.

Experiment Tracking

Logs hyperparameters, code versions, data, and results for reproducible experimentation.

Examples:

  • Track accuracy for different learning rates.
  • Log Git commit for model training runs.
  • Compare architectures by validation loss.

Code Example: 

https://mlflow.org

 LLM & Agent Patterns

Chain of Thought Prompting

Encourages models to reason through problems step-by-step before providing an answer to improve accuracy on complex tasks.

Examples:

  • Break down math word problems into intermediate steps.
  • Write out logical reasoning for a trivia question.
  • Generate stepwise solutions for code debugging.

Code Example: 

https://arxiv.org/abs/2201.11903

Function Calling (Tool Use)

Allows LLMs to invoke external tools or APIs by generating structured calls during inference.

Examples:

  • Generate API call to fetch weather data.
  • Call calculator API for arithmetic.
  • Query a database to answer user question.

Code Example: 

https://platform.openai.com/docs/guides/gpt/function-calling

ReAct (Reason and Act)

Interleaves reasoning traces and tool use in a loop to solve dynamic, multi-step tasks.

Examples:

  • Use search tool in reasoning chain for trivia.
  • Interleave calculations and logic for math.
  • Plan and act in interactive environments.

Code Example: 

https://arxiv.org/abs/2210.03629

Self-Correction / Reflexion

Model critiques and revises its own outputs using internal or external feedback.

Examples:

  • Critique code for bugs and regenerate.
  • Identify missing points in summaries and fix.
  • Rephrase confusing chatbot responses.

Code Example: 

https://arxiv.org/abs/2303.11366

Graph of Thoughts (GoT)

Structures reasoning as a graph, allowing merging, cycling, and iterative refinement of ideas.

Examples:

  • Merge partial proofs in math problem solving.
  • Combine plotlines in creative writing.
  • Refine designs in architecture planning.

Code Example: 

https://arxiv.org/abs/2308.09687

Self-Consistency

Runs multiple reasoning paths and selects the most common outcome for robustness.

Examples:

  • Solve math problem using multiple chains.
  • Generate several summaries and pick consensus.
  • Answer quiz questions with majority vote.

Code Example: 

https://arxiv.org/abs/2203.11171

Tree of Thoughts (ToT)

Explores multiple reasoning paths in parallel and prunes unpromising branches.

Examples:

  • Explore hypotheses for logic puzzles.
  • Generate different story continuations.
  • Test multiple proof strategies.

Code Example: 

https://arxiv.org/abs/2305.10601

System / Meta Prompt

Sets persistent instructions that guide the LLM’s behavior throughout a session.

Examples:

  • Define chatbot persona as a helpful pirate.
  • Set tone to professional for code assistant.
  • Enforce safety guidelines in responses.

Code Example: 

https://platform.openai.com/docs/guides/gpt

Chain-of-Verification (CoV)

Model generates and answers verification questions about its own output to improve factuality.

Examples:

  • Verify facts in a generated biography.
  • Check accuracy of summary key points.
  • Validate scientific claims.

Code Example: 

https://arxiv.org/abs/2306.11644

Prompt Chaining

Links the outputs of one prompt as inputs to the next, building complex workflows.

Examples:

  • Generate blog title, then outline, then post.
  • Extract entities, then summarize.
  • Create code, then generate tests.

Code Example: 

https://www.promptingguide.ai/techniques/prompt-chaining

Memory-Augmented Networks

Provides LLMs with external memory to store and retrieve information across sessions.

Examples:

  • Remember user preferences between chats.
  • Store facts for cumulative learning.
  • Log conversation history for context.

Code Example: 

https://github.com/langchain-ai/langchain

Structured Output / Output Parsing

Constrains output to machine-readable formats (e.g. JSON) for reliable downstream use.

Examples:

  • Output calendar event as JSON.
  • Return API response as structured dict.
  • Generate database insert statements.

Code Example: 

https://guides.langchain.com/docs/use_cases/structured_output

Tool-Augmented RAG

RAG agent uses tools to generate or augment retrieval queries dynamically.

Examples:

  • Call web search before retrieval.
  • Run calculator before knowledge base lookup.
  • Query external API to enhance context.

Code Example: 

https://docs.langchain.com/docs/use_cases/tool_augmented_rag

Retrieval-Augmented Generation (RAG)

Enhances generation by grounding outputs in retrieved external knowledge.

Examples:

  • Answer question with supporting documents.
  • Summarize internal wiki pages.
  • Generate code using retrieved docs.

Code Example: 

https://huggingface.co/blog/rag

LLM Orchestration Framework

Provides abstractions to build LLM apps with prompt chaining, retrieval, and tool use.

Examples:

  • LlamaIndex combines structured data + LLM.
  • Orchestrate multi-agent dialogue.

Code Example: 

https://python.langchain.com

Prompt Hub / Registry

Central store to manage, version, and share prompts across teams.

Examples:

  • Version control marketing copy prompts.
  • Catalog chatbot personas.
  • Benchmark and share best prompts.

Code Example: 

https://www.promptlayer.com


Efficiency & Optimization

Model Quantization

Reduces model size and speeds up inference by lowering numerical precision of weights (e.g., from 32-bit floats to 8-bit integers).

Examples:

  • Quantize CNN for mobile deployment.
  • Apply 8-bit quantization to BERT.
  • Speed up inference on edge devices.

Code Example: 

https://pytorch.org/docs/stable/quantization.html

Model Pruning

Removes redundant weights or neurons to reduce model complexity without significant loss in accuracy.

Examples:

  • Prune unimportant connections in ResNet.
  • Apply structured pruning to remove channels.
  • Reduce size of LSTM for deployment.

Code Example: 

https://pytorch.org/tutorials/intermediate/pruning_tutorial.html

Knowledge Distillation

Trains a smaller student model to mimic a large teacher model, preserving accuracy while reducing size and latency.

Examples:

  • Distill BERT into DistilBERT.
  • Train small CNN from ensemble of large models.
  • Compress speech model for mobile apps.

Code Example: 

https://huggingface.co/transformers/model_doc/distilbert.html

Sparse Models

Designs or trains models to have many zero weights, improving efficiency and interpretability.

Examples:

  • Train with L1 regularization for sparsity.
  • Sparse transformer for long sequences.
  • Sparse CNN for hardware acceleration.

Code Example: 

https://arxiv.org/abs/1904.10509

Low-Rank Approximation

Factorizes large weight matrices into smaller components to reduce parameters and compute.

Examples:

  • Apply SVD to compress linear layers.
  • Use low-rank adapters in LLMs.
  • Factorize embedding matrix.

Code Example: 

https://arxiv.org/abs/2106.09685

Weight Sharing

Shares weights across layers or units to reduce model size and enforce consistency.

Examples:

  • Tie embedding and output weights in LLM.
  • Share filters in convolutional layers.
  • Use shared layers in multi-task learning.

Code Example: 

https://arxiv.org/abs/1608.05859

Early Exit Networks

Allows inference to terminate early at intermediate layers when confident, saving compute on easy inputs.

Examples:

  • Early exit in CNNs for image classification.
  • Transformer with exit heads for fast inference.
  • Dynamic depth models for speech recognition.

Code Example: 

https://arxiv.org/abs/1709.01686

 Streaming & Architecture Patterns

Lambda Architecture for ML

Combines batch and streaming layers to enable ML systems to process both historical data and real-time updates for robust and low-latency predictions.

Examples:

  • Use batch layer to compute user embeddings nightly and speed layer to update session-based features.
  • Batch compute fraud aggregates, stream recent transactions for instant scoring.
  • Precompute recommendations offline and personalize with real-time click data.

Code Example: 

https://lambda-architecture.net

Kappa Architecture for ML

Processes all data through a single streaming pipeline, simplifying architecture by eliminating the need for separate batch and streaming code paths.

Examples:

  • Handle transactions in one continuous Kafka pipeline for fraud detection.
  • Serve recommendations with a single Spark Streaming pipeline.

Code Example: 

https://dataengineering.wiki/architecture/kappa-architecture

 Vector Database & Retrieval Patterns

Uses specialized databases to efficiently store, index, and retrieve high-dimensional vectors for similarity search in applications like RAG, image search, and recommendations.

Examples:

  • Store embeddings of documents and retrieve relevant ones for RAG.
  • Build image search using stored visual embeddings.
  • Retrieve song embeddings for audio recommendation.

Code Example: 

https://docs.pinecone.io/docs/overview

Approximate Nearest Neighbor (ANN)

Speeds up similarity search by finding nearly closest vectors using probabilistic algorithms rather than exact search, trading a small accuracy loss for huge efficiency gains.

Examples:

  • Use FAISS for fast text embedding search in LLM apps.
  • Deploy ScaNN for vector search at Google scale.

Code Example: 

https://github.com/facebookresearch/faiss

 Agent Coordination & Safety

Multi-Agent Collaboration

Designs AI systems where multiple specialized agents work together, passing information or tasks between them to solve complex problems more effectively.

Examples:

  • Use one agent for data retrieval and another for summarization.
  • Divide code generation and code review tasks between separate agents.
  • Combine planner agent and executor agent in task automation.

Code Example:

https://github.com/langchain-ai/langchain/blob/master/docs/docs/modules/agents/multi_agent.md

Voting Ensemble of Agents

Aggregates outputs from multiple independent agents and uses majority vote or consensus mechanism to determine the final output, improving robustness and reliability.

Examples:

  • Vote between agents generating summaries for consistency.
  • Ensemble chatbot responses for better safety.
  • Aggregate answers from different math solvers.

Code Example: 

https://arxiv.org/abs/2305.14314

Safety Filter in Agents

Incorporates a dedicated filtering agent or module that reviews other agents’ outputs to block harmful, biased, or incorrect results before reaching the user.

Examples:

  • Screen chatbot responses for unsafe content.
  • Filter generated code to prevent injection flaws.
  • Review summaries to ensure factual accuracy.

Code Example: 

https://platform.openai.com/docs/guides/safety-best-practices

 Optimization & Search

Hyperparameter Optimization

Systematically searches for the best hyperparameters (like learning rate, batch size) to improve model performance while balancing training cost.

Examples:

  • Tune learning rate and dropout for CNN on CIFAR-10.
  • Grid search for SVM kernel parameters.
  • Random search for XGBoost depth and learning rate.

Code Example: 

https://scikit-learn.org/stable/modules/grid_search.html

Neural Architecture Search (NAS)

Automatically discovers optimal neural network architectures for a given task, reducing the need for manual design.

Examples:

  • AutoML finds CNN architecture for image classification.
  • NAS generates RNN cell structure for language modeling.
  • Search for efficient transformer variants for NLP.

Code Example: 

https://www.tensorflow.org/model_optimization/guide/auto_clustering

Bayesian Optimization

Uses probabilistic models to efficiently explore the hyperparameter space, balancing exploration and exploitation to find optimal configurations.

Examples:

  • Optimize learning rate for deep model with few trials.
  • Tune regularization strength of logistic regression.

Code Example: 

https://scikit-optimize.github.io/stable

 Additional MLOps Patterns

Blue/Green Deployment

Runs two identical environments (blue and green) where one serves production traffic while the other is updated and tested, enabling smooth switchovers with minimal downtime.

Examples:

  • Deploy new recommendation model in green and switch traffic after validation.
  • Test updated fraud detection model in green environment before cutover.
  • Roll back to blue environment if green causes issues.

Code Example: 

https://martinfowler.com/bliki/BlueGreenDeployment.html

Model Drift Detection

Monitors for changes in data distribution or model performance over time to trigger retraining or alerts when drift occurs.

Examples:

  • Detect input feature distribution shift in fraud detection model.
  • Monitor drop in accuracy on validation set for image classifier.
  • Identify drift in click-through rate for ad model.

Code Example: 

https://github.com/evidentlyai/evidently

Model Rollback

Provides a mechanism to quickly revert to a previous stable model if a new deployment causes issues or degraded performance.

Examples:

  • Revert to previous spam detection model after drop in precision.
  • Rollback recommendation model after surge in customer complaints.
  • Restore prior fraud model after excessive false positives.

Code Example: 

https://mlflow.org/docs/latest/model-registry.html

Automated Model Promotion

Automatically advances a model through testing and deployment stages when it meets predefined performance criteria.

Examples:

  • Promote model to production after passing A/B test thresholds.
  • Auto-deploy best model from nightly training pipeline.
  • Advance fraud model if validation AUC exceeds benchmark.

Code Example: 

https://mlflow.org/docs/latest/model-registry.html

Canary Rollback

Monitors canary deployment metrics in real time and automatically rolls back if anomalies or degradations are detected.

Examples:

  • Auto-rollback after spike in prediction latency.
  • Revert canary fraud model after precision drops below threshold.
  • Undo canary deployment if customer error rate rises.

Code Example: https://cloud.google.com/architecture/canary-deployments-with-spinnaker-on-gke

Traffic Mirroring

Duplicates live production traffic to a new model or system without affecting users, for testing and validation.

Examples:

  • Mirror traffic to test fraud model on real transactions.
  • Send real user queries to new search ranking model in parallel.
  • Validate recommendation model with mirrored e-commerce traffic.

Code Example: 

https://istio.io/latest/docs/tasks/traffic-management/mirroring

 Security Patterns

Adversarial Training

Strengthens model robustness by including adversarially perturbed examples during training to make it resilient against small, malicious input changes.

Examples:

  • Train image classifier with adversarial examples to resist FGSM attacks.
  • Harden NLP model against input word substitutions.
  • Improve malware detection model by adding adversarial samples.

Code Example: 

https://pytorch.org/tutorials/beginner/fgsm_tutorial.html

Data Encryption in Serving

Ensures that input data sent to and predictions from an ML model are encrypted during transmission and processing to protect sensitive information.

Examples:

  • Encrypt medical images during API transmission to model server.
  • Secure user text inputs sent for sentiment analysis.
  • Encrypt financial transactions scored for fraud detection.

Code Example: 

https://cloud.google.com/security/encryption-in-transit

Differential Privacy

Adds noise to data or gradients during training to ensure that individual data points cannot be reverse-engineered from the model, preserving privacy.

Examples:

  • Train language model on user data with DP-SGD.
  • Apply differential privacy to location-based recommendation model.
  • Share aggregate analytics without leaking user-level data.

Code Example:

https://pytorch.org/tutorials/intermediate/differential_privacy_tutorial.html

Homomorphic Encryption Inference

Enables running inference on encrypted data so that the server never sees the plaintext input, preserving privacy end-to-end.

Examples:

  • Run encrypted credit scoring on sensitive user data.
  • Infer disease risk from encrypted genomic data.
  • Predict spending patterns from encrypted transaction histories.

Code Example: 

https://github.com/microsoft/SEAL

 Federated & Distributed Learning

Federated Learning Pattern

Enables training machine learning models across multiple decentralized devices or servers holding local data samples, without exchanging their data, improving privacy and data security.

Examples:

  • Train predictive text models on smartphones without uploading text data.
  • Learn personalized health models from hospitals without centralizing patient data.
  • Build recommendation models using local user behavior on IoT devices.

Code Example: 

https://www.tensorflow.org/federated

Split Learning

Trains parts of a model separately on different devices or sites, so no single device has access to the full model or data, improving privacy while enabling collaborative learning.

Examples:

  • Train early layers of medical model at hospitals, later layers at a central server.
  • Edge devices compute initial embeddings, cloud server completes training.
  • Collaborative training where data owners only see model fragments.

Code Example: 

https://github.com/mit-han-lab/split-learning

Decentralized Learning

Distributes both data and model updates across multiple nodes in a peer-to-peer fashion without relying on a central server, improving robustness and privacy.

Examples:

  • Train blockchain-based ML model across peer nodes.
  • Decentralized IoT device collaboration for anomaly detection.
  • Collaborative training of intrusion detection systems across organizations.

Code Example: 

https://arxiv.org/abs/1902.01046

 Agent Memory Patterns

Episodic Memory

Stores information about individual interactions or episodes, allowing an agent to recall specific past conversations or events during future interactions to provide contextually relevant responses.

Examples:

  • Chatbot recalls user’s name and preferences during a session.
  • Virtual assistant remembers steps taken in a multi-turn task.
  • Game AI remembers player’s last move in the current game session.

Code Example:

https://docs.langchain.com/docs/components/memory/episodic-memory

Long-Term Memory

Persists knowledge across sessions or episodes, allowing a model or agent to retain facts, user data, or previous interactions over time for lifelong learning or personalization.

Examples:

  • AI tutor remembers student’s weaknesses across lessons.
  • Customer service bot recalls past issues a user reported.
  • Personal assistant remembers recurring appointments and contacts.

Code Example:

https://docs.langchain.com/docs/components/memory/long-term-memory

Scratchpad Memory

A temporary working memory that the model uses to store intermediate reasoning steps or calculations during task execution, enabling more complex reasoning or problem solving.

Examples:

  • LLM writes intermediate steps when solving math problems.
  • Agent records partial results in a multi-part query.
  • AI planner keeps track of subtasks completed in a complex workflow.

Code Example: 

https://arxiv.org/abs/2206.07285

 Responsible AI

Bias Detection in Embeddings

Identifies and quantifies unintended biases in learned embeddings, such as gender or racial stereotypes, by analyzing vector relationships or clustering behavior.

Examples:

  • Detect gender bias in word embeddings where ‘man’ is closer to ‘programmer’ than ‘woman’.
  • Analyze racial bias in name embeddings using clustering.
  • Assess bias in customer embeddings affecting credit decisions.

Code Example: 

https://github.com/stanfordnlp/SemBias

Bias Mitigation via Reweighting

Applies different sample weights during training to reduce bias by emphasizing underrepresented or disadvantaged groups, promoting fairness in model outputs.

Examples:

  • Reweight minority class samples in a hiring model.
  • Assign higher weights to underrepresented demographic groups in medical diagnosis model.
  • Balance gender representation in resume screening classifier.

Code Example: 

https://fairlearn.org/main/user_guide/mitigation.html#reweighing

Post-hoc Fairness Correction

Adjusts model outputs after training to improve fairness without changing the underlying model, often by modifying decision thresholds or applying fairness constraints at prediction time.

Examples:

  • Shift thresholds to equalize true positive rates across groups.
  • Calibrate fraud model scores to reduce disparate impact.
  • Apply fairness constraints to loan approval outputs.

Code Example: 

https://fairlearn.org/main/user_guide/mitigation.html#post-processing

 AutoML Patterns

Automated Feature Engineering

Automatically generates, selects, and transforms features from raw data to improve model performance, often using domain knowledge or search algorithms.

Examples:

  • Use featuretools to create features from transactional data.
  • Auto-generate time-based features in time series forecasting.
  • Automatically derive cross-features in e-commerce clickstream data.

Code Example: 

https://github.com/alteryx/featuretools

Auto-Ensembling

Automatically builds and combines multiple models or pipelines to form an ensemble that outperforms individual models, typically using stacking, bagging, or blending strategies.

Examples:

  • Auto-sklearn generates and ensembles pipelines for tabular data.
  • AutoGluon creates stacked models for text classification.
  • H2O AutoML combines GBMs, RFs, and DNNs into an ensemble.

Code Example: 

https://automl.github.io/auto-sklearn/stable

Auto-Augmentation

Automatically searches for and applies optimal data augmentation policies during training to improve model generalization and robustness.

Examples:

  • AutoAugment discovers best image augmentation strategies for CIFAR-10.
  • Apply RandAugment on ImageNet to boost accuracy.
  • Auto-select text augmentation rules for sentiment analysis.

Code Example:

https://github.com/tensorflow/models/tree/master/research/autoaugment

 Few-shot / Zero-shot Patterns

Few-shot Learning

Enables a model to learn a task using only a small number of labeled examples by leveraging prior knowledge or pretraining on related tasks.

Examples:

  • Classify rare species of animals with only 5 labeled images.
  • Personalize a chatbot’s tone with a few dialogue examples.
  • Fine-tune a sentiment classifier on a small set of domain-specific reviews.

Code Example:

https://huggingface.co/transformers/main_classes/pipelines.html#zero-shot-classification

Zero-shot Learning

Allows a model to generalize to new tasks or classes it has never seen before by using semantic information like descriptions, labels, or embeddings.

Examples:

  • Classify text into new categories using a language model without training on those categories.
  • Identify objects in images by leveraging textual descriptions of classes.
  • Apply sentiment analysis on a new language using multilingual embeddings.

Code Example:

https://huggingface.co/transformers/main_classes/pipelines.html#zero-shot-classification

 Time Series & Forecasting

Sliding Window Forecasting

Transforms time series data into supervised learning format by creating overlapping windows of historical data points as input features to predict future values.

Examples:

  • Forecast daily sales using the past 30 days as input features.
  • Predict machine temperature using a 60-minute rolling window of sensor readings.
  • Estimate stock prices by feeding past 10-day price windows into a regression model.

Code Example:

https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Multi-horizon Forecasting

Predicts multiple future time points at once (e.g., next 3, 7, or 30 days) rather than a single next-step prediction, supporting planning and decision-making.

Examples:

  • Forecast weekly energy consumption for the next 4 weeks.
  • Predict 7-day demand for ride-sharing services.
  • Estimate future weather conditions at multiple horizons (1hr, 6hr, 24hr).

Code Example: 

https://github.com/jdb78/pytorch-forecasting

Seasonality Decomposition

Separates a time series into trend, seasonal, and residual components to aid in modeling, interpretation, and improved forecasting.

Examples:

  • Decompose retail sales data to isolate holiday season effects.
  • Extract yearly seasonality from temperature data for climate studies.
  • Remove weekly cycles from website traffic before modeling residual patterns.

Code Example: https://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.seasonal_decompose.html

 Vision-Specific Patterns

Patch Embeddings

Divides an image into fixed-size patches, flattens and embeds them as input tokens for transformer-based vision models, preserving spatial structure for further processing.

Examples:

  • Split 224×224 image into 16×16 patches and embed as 1D vectors.
  • Use patch embeddings for input to Vision Transformer (ViT).
  • Convert medical image patches into embeddings for tumor detection.

Code Example: https://keras.io/examples/vision/image_classification_with_vision_transformer/

Vision Transformer (ViT)

Applies transformer architectures, originally developed for NLP, to image classification by using patch embeddings as input tokens, allowing global self-attention across the image.

Examples:

  • Classify ImageNet images using ViT architecture.
  • Use ViT for medical imaging classification (e.g., X-rays).
  • Apply ViT to satellite imagery for land cover classification.

Code Example: 

https://github.com/lucidrains/vit-pytorch

Multi-Scale Feature Fusion

Combines feature maps from different levels of a neural network hierarchy to leverage both coarse and fine details, improving performance in tasks like segmentation and detection.

Examples:

  • Fuse features at multiple scales for object detection (e.g., FPN).
  • Enhance segmentation of small and large objects by multi-scale fusion.
  • Merge low-level edges and high-level semantics in scene parsing.

Code Example: https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn.html

 Audio/NLP Patterns

Audio Embedding

Transforms raw audio signals into fixed-length numerical representations (embeddings) that capture meaningful characteristics such as pitch, tone, and speaker identity.

Examples:

  • Generate speaker embeddings for voice authentication.
  • Extract audio embeddings for emotion recognition.
  • Produce embeddings from audio clips for search and retrieval.

Code Example: 

https://github.com/pyannote/pyannote-audio

Spectrogram Augmentation

Applies transformations (e.g., masking, warping) to spectrogram representations of audio to improve model robustness and generalization during training.

Examples:

  • Apply frequency masking to speech spectrograms for ASR models.
  • Use time masking on spectrograms for noise-robust speaker ID.
  • Warp spectrograms to simulate pitch variation in training data.

Code Example: https://github.com/espnet/espnet/blob/master/espnet2/train/spec_augment.py

Subword Tokenization

Breaks text into subword units (e.g., Byte Pair Encoding, WordPiece) to handle rare words, improve vocabulary efficiency, and enable better generalization to unseen tokens.

Examples:

  • Tokenize text using BPE for neural machine translation.
  • Use WordPiece for encoding text in BERT models.
  • Apply SentencePiece for multilingual text processing.

Code Example: 

https://github.com/google/sentencepiece

Character-Level CNNs

Processes text at the character level using convolutional neural networks to capture subword patterns, handle typos, and improve robustness to spelling variations.

Examples:

  • Build character-level CNN for sentiment analysis.
  • Apply char-CNNs for text classification with noisy inputs.
  • Use char-CNNs for language ID from short text snippets.

Code Example: 

https://github.com/zhangxiangxiao/Crepe

 Generative Patterns

GAN Pattern

Uses two neural networks—a generator and a discriminator—that are trained together in a minimax game where the generator tries to create realistic data and the discriminator tries to distinguish between real and fake data.

Examples:

  • Generate realistic human faces using StyleGAN.
  • Produce synthetic handwritten digits using DCGAN.
  • Create new product images for e-commerce catalog augmentation.

Code Example: 

https://github.com/pytorch/examples/tree/main/dcgan

Style Transfer

Applies the artistic style of one image (e.g., a painting) to the content of another image by optimizing or training a model to blend style and content representations.

Examples:

  • Transfer Van Gogh’s painting style onto a personal photo.
  • Apply Picasso-style rendering to product images for marketing.
  • Create artistic variations of landscape photos.

Code Example: 

https://github.com/pytorch/examples/tree/main/fast_neural_style

Conditional Generative Models

Generative models (e.g., GANs, VAEs) that are conditioned on auxiliary information such as class labels or text descriptions, enabling controlled data generation.

Examples:

  • Generate images of specific digits using conditional GANs (cGANs).
  • Create faces of people with specified attributes (e.g., smiling, glasses).
  • Synthesize images from text descriptions using conditional diffusion models.

Code Example: 

https://github.com/eriklindernoren/Keras-GAN#conditional-gan

 Explainability & Debugging

Saliency Map

A technique for visualizing which parts of the input data (e.g., pixels in an image) contribute most to a model’s prediction, typically by computing gradients of the output with respect to the input.

Examples:

  • Highlight regions of an X-ray that led to a disease diagnosis.
  • Visualize pixels influencing an image classification decision.
  • Show key areas of handwritten digits important for recognition.

Code Example: 

https://github.com/utkuozbulak/pytorch-cnn-visualizations

LIME

Local Interpretable Model-agnostic Explanations (LIME) approximates complex models locally with simple interpretable models to explain individual predictions.

Examples:

  • Explain a single prediction of a black-box text classifier.
  • Understand why an image classifier labeled an object as a cat.
  • Interpret fraud detection results for a specific transaction.

Code Example: 

https://github.com/marcotcr/lime

SHAP Integration

SHapley Additive exPlanations (SHAP) provides a unified framework for explaining model outputs based on cooperative game theory, attributing each feature’s contribution fairly.

Examples:

  • Explain feature contributions in a credit scoring model.
  • Understand feature importance for an XGBoost prediction.
  • Visualize SHAP values for a customer churn prediction.

Code Example: 

https://github.com/slundberg/shap

 Robustness Patterns

Noise Injection

Adds random noise to model inputs, intermediate activations, or gradients during training to improve generalization and robustness by making the model less sensitive to small perturbations.

Examples:

  • Inject Gaussian noise into input images for robust image classification.
  • Add noise to word embeddings during training to improve text model generalization.
  • Apply noise to gradients in reinforcement learning to encourage exploration.

Code Example: 

https://keras.io/api/layers/regularizers/#gaussiannoise-class

Adversarial Detection

Implements methods to identify inputs designed to fool models (adversarial examples) by analyzing unusual patterns in activations or prediction confidence.

Examples:

  • Detect adversarial images that attempt to trick image classifiers.
  • Identify perturbed text inputs targeting NLP models.
  • Flag adversarial transactions in fraud detection systems.

Code Example: 

https://github.com/anishathalye/obfuscated-gradients

Confidence Calibration

Adjusts a model’s output probabilities so they better reflect the true likelihood of correctness, often using techniques like temperature scaling or isotonic regression.

Examples:

  • Calibrate output probabilities of a medical diagnosis model.
  • Improve the confidence scores of a speech recognition system.
  • Ensure a self-driving car’s perception model’s probabilities match true detection rates.

Code Example: 

https://github.com/gpleiss/temperature_scaling

 Emerging Agent Patterns

Plan-and-Execute Agent

An AI agent architecture that explicitly separates high-level planning from low-level execution, improving reasoning in complex multi-step tasks by structuring goals and sub-tasks.

Examples:

  • An agent plans and books travel: first finding flights, then hotels, then car rentals.
  • A robot plans a sequence of actions to clean a room, then executes each step.
  • A coding assistant plans code structure before generating detailed functions.

Code Example:

https://github.com/langchain-ai/langchain/tree/master/libs/experimental/langchain_experimental/plan_and_execute

Modular Tool-Using Agent

An agent that dynamically selects and uses external tools or APIs based on task requirements, allowing flexible problem solving by composing specialized modules.

Examples:

  • A chatbot that queries APIs for weather, news, or stock prices as needed.
  • An assistant that switches between translation, summarization, and search tools.
  • An agent for data analysis that calls plotting, regression, or clustering tools.

Code Example: https://python.langchain.com/docs/modules/agents/agent_types/openai_functions

Reflection-Augmented Agent

An AI agent enhanced with self-reflection capabilities, where it reviews, critiques, and improves its own outputs iteratively to increase accuracy and alignment with goals.

Examples:

  • A writing assistant generates an outline, critiques it, and revises it.
  • A math-solving agent checks its solution for correctness and revises if needed.
  • A chatbot reflects on its prior turn to improve clarity in conversation.

Code Example:

https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/experimental/generative_agents

 Data Engineering Patterns

Streaming Feature Engineering

A pattern where features are computed in real-time as new data arrives, enabling low-latency predictions and immediate model updates with fresh information.

Examples:

  • Compute rolling averages of sensor data for anomaly detection.
  • Create session-level features on-the-fly for online advertising models.
  • Generate real-time aggregates (e.g., click counts) for recommendation systems.

Code Example: https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/streaming

Batch Feature Join

A process of joining large static datasets or pre-aggregated features in batch mode, usually during offline model training or nightly prediction runs.

Examples:

  • Join user profiles with historical transaction data for fraud model training.
  • Combine clickstream data with user demographics for ad targeting.
  • Aggregate purchase histories and join with customer records for churn prediction.

Code Example: 

https://github.com/feast-dev/feast

Incremental Model Update

Updates the model continuously or periodically as new data becomes available, without requiring retraining from scratch, to adapt to changing data distributions.

Examples:

  • Update a recommendation model with each new user interaction.
  • Incrementally train a spam detection model with new labeled emails.
  • Refresh predictive maintenance models as new sensor data streams in.

Code Example:

https://scikit-learn.org/stable/modules/scaling_strategies.html#incremental-learning

 Meta Patterns

Meta-Prompting

A pattern where prompts themselves are generated, selected, or adapted dynamically by an LLM or another system, enabling flexible task formulation and improved performance across diverse scenarios.

Examples:

  • Automatically generate tailored prompts for different user intents in a chatbot.
  • Create context-specific prompts for document summarization based on document type.
  • Adapt prompts on-the-fly to match the tone (e.g., formal vs. casual) of a conversation.

Code Example: 

https://github.com/langchain-ai/langchain  (see meta-prompting examples in LangChain docs)

Auto-Prompt Tuning

An automated process of learning optimal prompt templates or embeddings for guiding LLMs, typically using gradient-based optimization or reinforcement learning, to maximize task-specific performance.

Examples:

  • Tune soft prompts to improve sentiment classification accuracy with minimal labeled data.
  • Learn optimized prompts for code generation tasks to reduce errors in generated code.
  • Automatically discover effective prompts for question-answering benchmarks.

Code Example: 

https://github.com/bojone/Prompt-Tuning

 Other Advanced MLOps

ML-as-Code (GitOps for ML)

Treats machine learning pipelines, models, data configurations, and deployments as code, managed through version-controlled repositories (e.g., Git), enabling automated, reproducible, and auditable ML operations.

Examples:

  • Use Git pull requests to trigger CI/CD pipelines that retrain and redeploy models.
  • Track data preprocessing code, model architecture, and hyperparameters in Git.
  • Automate model promotion to production upon code approval in Git.

Code Example: 

https://mlflow.org/docs/latest/model-registry.html  (MLflow with GitOps principles)

Cross-Region Serving

Deploys and serves ML models in multiple geographic regions to ensure low-latency access, high availability, and resilience against regional failures.

Examples:

  • Serve recommendation models from data centers in the US, Europe, and Asia for global users.
  • Deploy fraud detection models in different regions to meet data sovereignty requirements.
  • Distribute NLP models across regions to reduce API latency for users worldwide.

Code Example: 

https://cloud.google.com/vertex-ai/docs/general/mlops-overview  (Vertex AI MLOps for cross-region)

Edge Model Serving

Deploys lightweight or optimized ML models directly on edge devices (e.g., smartphones, IoT sensors), enabling local inference without relying on cloud connectivity.

Examples:

  • Run object detection models on drones for obstacle avoidance.
  • Deploy speech recognition models on smart speakers for offline commands.
  • Serve anomaly detection models on factory sensors for real-time alerts.

Code Example: 

https://www.tensorflow.org/lite/guide  (TensorFlow Lite for edge deployment)

Large Language Model (LLM) 

Encoder-Decoder (Seq2Seq)

This architectural pattern maps an input sequence to an output sequence of potentially different length. It consists of two main components:

  • Encoder: Compresses the input sequence into a fixed-size context (or ‘thought’) vector that represents its meaning.
  • Decoder: Uses the context vector to generate the output sequence, producing one element at a time.

Examples:

  • A machine translation model encodes the English sentence ‘The cat sat on the mat’ and decodes it into the French sentence ‘Le chat s’est assis sur le tapis’.
  • A summarization model reads a lengthy article and generates a concise summary.
  • A chatbot reads the user’s input (‘How’s the weather?’) and generates an appropriate reply.

Code Example: 

https://www.tensorflow.org/text/tutorials/nmt_with_attention

Attention Mechanism

Enhances the Seq2Seq architecture by allowing the decoder to focus on different parts of the input sequence at each generation step. Instead of relying on a single context vector, the model dynamically weights input tokens based on their relevance to the current output step.

Examples:

  • During translation, focuses on the relevant subject or verb as each output word is generated, enabling better handling of long sentences.
  • In summarization, highlights key parts of a source document to inform the generated summary.
  • In image captioning, focuses on the part of the image corresponding to each generated word (e.g., focusing on a ‘dog’ when generating ‘dog’).

Code Example: 

https://www.tensorflow.org/text/tutorials/nmt_with_attention

Mixture of Experts (MoE)

A model structure where layers are replaced by sparse mixtures of small ‘expert’ networks. For each input, a routing network selects a subset of experts to process it. This allows high parameter counts without high compute per inference.

Examples:

  • A large LLM uses different experts depending on whether the question is about coding, history, or science.
  • A multimodal model uses text experts for text inputs, image experts for image inputs.
  • A translation model activates specific experts based on the language pair (e.g., English-French or English-German).

Code Example: 

https://github.com/tensorflow/mesh

World Model

A model that learns a predictive simulation of the environment. Agents can run internal simulations (‘imagined rollouts’) before acting, enabling safer and more efficient planning.

Examples:

  • A robot stacks blocks using a world model to simulate physics and avoid collapsing the tower.
  • A game-playing agent simulates opponent moves and plans its own strategy.
  • A trading agent simulates market responses to test strategies before live trading.

Code Example: 

https://github.com/danijar/dreamer

Large Concept Model (LCM)

An alternative to token-by-token models that reasons at the concept level. The model maps input to a semantic concept vector, reasons about high-level ideas, and decodes into text. This supports better coherence, reasoning, and multilingual performance.

Examples:

  • Summarizing long technical documents while preserving argument structure.
  • Generating story outlines at the concept level before expanding into prose.

Code Example: 

https://arxiv.org/abs/1904.05862