Glossary

A

Agents: AI systems that can autonomously plan and execute multi-step tasks by calling tools, querying data sources, and making decisions without human intervention at each step.
API (Application Programming Interface): A standardized interface that allows software systems to communicate. In AI, APIs let your applications send prompts to a model and receive generated responses programmatically.

Chunking: The process of splitting large documents into smaller, overlapping segments before generating embeddings. Chunk size and overlap strategy directly affect retrieval quality in RAG systems.
Context Window: The maximum amount of text (measured in tokens) a model can process in a single request. Larger context windows allow more information but increase cost and latency.

Embeddings: Numerical representations (vectors) of text that capture semantic meaning. Similar concepts produce vectors that are close together, enabling machines to understand relationships between words, sentences, or documents.

Fine-Tuning: The process of further training a pre-trained model on a specific dataset to specialize its behavior for a particular domain or task, such as banking compliance language.
Foundation Model: A large AI model trained on broad data that can be adapted to many tasks. Examples include GPT-4, Claude, and Gemini. Banks evaluate these for capabilities, safety, and regulatory fit.

Guardrails: Safety mechanisms that constrain AI model outputs to prevent harmful, off-topic, or non-compliant responses. Critical in banking for regulatory adherence and brand safety.

Hallucination: When an AI model generates plausible-sounding but factually incorrect information. A critical risk in banking where inaccurate outputs could lead to regulatory violations or financial losses.

Inference: The process of running a trained model to generate predictions or outputs from new input data. Inference cost, latency, and throughput are key factors in enterprise AI deployment.

Large Language Model (LLM): A neural network trained on vast amounts of text data that can understand and generate human language. LLMs power chatbots, document analysis, code generation, and many enterprise AI applications.

Model Risk Management: The regulatory framework (OCC SR 11-7) governing how banks validate, monitor, and control AI models. Ensures models perform as expected and risks are identified and mitigated.

Orchestration Framework: Software that coordinates LLMs, tools, and data sources into complex workflows. Frameworks like LangChain and LangGraph manage prompt chains, memory, and tool calling for multi-step AI tasks.

Prompt Engineering: The practice of crafting effective instructions (prompts) to guide AI model behavior. Techniques include few-shot examples, chain-of-thought reasoning, and role-based system instructions.

Retrieval-Augmented Generation (RAG): A pattern that combines document retrieval with LLM generation. The system searches a knowledge base for relevant context, then feeds it to the model to produce grounded, accurate answers.

Semantic Search: Search that understands meaning rather than just matching keywords. Uses embeddings to find conceptually similar documents even when they use different terminology.
System Instructions: Persistent instructions provided to an LLM that define its role, behavior constraints, and output format. System instructions shape every response without being visible to end users.

Temperature: A parameter controlling the randomness of model outputs. Lower temperature (0.0-0.3) produces focused, deterministic responses; higher temperature (0.7-1.0) produces more creative, varied outputs.
Tokens: The basic units of text that LLMs process. A token is roughly 3/4 of an English word. Token counts determine cost, speed, and context window limits for every API call.
Transformer: The neural network architecture underlying modern LLMs. Transformers use self-attention mechanisms to process relationships between all parts of the input simultaneously, enabling powerful language understanding.

Vector Database: A specialized database optimized for storing and querying high-dimensional vectors (embeddings). Enables fast similarity search across millions of documents for RAG and recommendation systems.

Zero-Shot / Few-Shot Learning: The ability of LLMs to perform tasks with no examples (zero-shot) or just a few examples (few-shot) provided in the prompt, without requiring model retraining.