Glossary
A
- Agents
- AI systems that can autonomously plan and execute multi-step tasks by calling tools, querying data sources, and making decisions without human intervention at each step.
- API (Application Programming Interface)
- A standardized interface that allows software systems to communicate. In AI, APIs let your applications send prompts to a model and receive generated responses programmatically.
C
- Chunking
- The process of splitting large documents into smaller, overlapping segments before generating embeddings. Chunk size and overlap strategy directly affect retrieval quality in RAG systems.
- Context Window
- The maximum amount of text (measured in tokens) a model can process in a single request. Larger context windows allow more information but increase cost and latency.
E
- Embeddings
- Numerical representations (vectors) of text that capture semantic meaning. Similar concepts produce vectors that are close together, enabling machines to understand relationships between words, sentences, or documents.
F
- Fine-Tuning
- The process of further training a pre-trained model on a specific dataset to specialize its behavior for a particular domain or task, such as banking compliance language.
- Foundation Model
- A large AI model trained on broad data that can be adapted to many tasks. Examples include GPT-4, Claude, and Gemini. Banks evaluate these for capabilities, safety, and regulatory fit.
G
- Guardrails
- Safety mechanisms that constrain AI model outputs to prevent harmful, off-topic, or non-compliant responses. Critical in banking for regulatory adherence and brand safety.
H
- Hallucination
- When an AI model generates plausible-sounding but factually incorrect information. A critical risk in banking where inaccurate outputs could lead to regulatory violations or financial losses.
I
- Inference
- The process of running a trained model to generate predictions or outputs from new input data. Inference cost, latency, and throughput are key factors in enterprise AI deployment.
L
- Large Language Model (LLM)
- A neural network trained on vast amounts of text data that can understand and generate human language. LLMs power chatbots, document analysis, code generation, and many enterprise AI applications.
M
- Model Risk Management
- The regulatory framework (OCC SR 11-7) governing how banks validate, monitor, and control AI models. Ensures models perform as expected and risks are identified and mitigated.
O
- Orchestration Framework
- Software that coordinates LLMs, tools, and data sources into complex workflows. Frameworks like LangChain and LangGraph manage prompt chains, memory, and tool calling for multi-step AI tasks.
P
- Prompt Engineering
- The practice of crafting effective instructions (prompts) to guide AI model behavior. Techniques include few-shot examples, chain-of-thought reasoning, and role-based system instructions.
R
- Retrieval-Augmented Generation (RAG)
- A pattern that combines document retrieval with LLM generation. The system searches a knowledge base for relevant context, then feeds it to the model to produce grounded, accurate answers.
S
- Semantic Search
- Search that understands meaning rather than just matching keywords. Uses embeddings to find conceptually similar documents even when they use different terminology.
- System Instructions
- Persistent instructions provided to an LLM that define its role, behavior constraints, and output format. System instructions shape every response without being visible to end users.
T
- Temperature
- A parameter controlling the randomness of model outputs. Lower temperature (0.0-0.3) produces focused, deterministic responses; higher temperature (0.7-1.0) produces more creative, varied outputs.
- Tokens
- The basic units of text that LLMs process. A token is roughly 3/4 of an English word. Token counts determine cost, speed, and context window limits for every API call.
- Transformer
- The neural network architecture underlying modern LLMs. Transformers use self-attention mechanisms to process relationships between all parts of the input simultaneously, enabling powerful language understanding.
V
- Vector Database
- A specialized database optimized for storing and querying high-dimensional vectors (embeddings). Enables fast similarity search across millions of documents for RAG and recommendation systems.
Z
- Zero-Shot / Few-Shot Learning
- The ability of LLMs to perform tasks with no examples (zero-shot) or just a few examples (few-shot) provided in the prompt, without requiring model retraining.