Skip to content
AI Foundations for Bankers
0%

Model Selection Framework for Banking

intermediate12 min readmodel-selectionframeworkdecision-makingcompliancecost-optimization

From Model Awareness to Model Strategy

You now understand the major foundation models -- their strengths, trade-offs, and deployment options. But understanding individual models is not the same as having a model strategy. This unit provides the framework your institution needs to make structured, defensible model selection decisions for each banking use case.

The framework addresses four dimensions: Capability (can the model do the job?), Cost (at what total cost of ownership?), Compliance (does it satisfy regulatory requirements?), and Control (what level of oversight and customization do you retain?). Every model selection decision requires balancing these four dimensions against the specific requirements of the use case.

The Four-Dimension Framework

Dimension 1: Capability

Not every banking task requires the most powerful model. Match model capability to task complexity:

Task ComplexityExample Banking TasksModel Tier
LowDocument classification, email routing, FAQ responseSmall/fast models (Claude Haiku, Llama 3 8B, Mistral 7B)
MediumDocument summarization, draft generation, data extractionMid-tier models (Claude Sonnet, GPT-4o mini, Llama 3 70B)
HighRegulatory interpretation, complex risk analysis, multi-step reasoningTop-tier models (Claude Opus, GPT-4o, Command R+ for RAG)

Over-provisioning capability wastes money. A simple email classification task does not need GPT-4o -- a model 10x cheaper will perform equally well. Under-provisioning capability creates risk. A regulatory interpretation task needs the strongest available reasoning.

BANKING ANALOGY

This is the same principle you apply to staffing decisions. You do not assign a Managing Director to process routine wire transfers, and you do not assign a junior analyst to structure a complex syndicated loan. Each task has an appropriate expertise level. AI model selection follows the same logic -- match the model's capability to the task's complexity, and deploy your most capable (and expensive) resources only where they add the most value.

Dimension 2: Cost

Total cost of ownership varies dramatically based on deployment approach:

Per-query pricing (cloud APIs):

  • Best for: variable workloads, lower volumes, rapid prototyping
  • Cost driver: token volume. Price typically ranges from $1 to $30 per million tokens depending on model tier
  • Watch out for: costs scaling linearly with usage -- a successful deployment can become expensive quickly

Managed infrastructure (Together AI, AWS Bedrock):

  • Best for: steady-state workloads, moderate volumes, when you want model variety without infrastructure management
  • Cost driver: provisioned capacity. You pay for reserved compute regardless of utilization
  • Watch out for: over-provisioning capacity during low-demand periods

Self-hosted (on-premises or dedicated cloud):

  • Best for: high-volume workloads, maximum data control, predictable costs at scale
  • Cost driver: hardware, operations, talent. Fixed costs regardless of query volume
  • Watch out for: underestimating the operational burden and talent requirements

KEY TERM

Total Cost of Ownership (TCO): The complete cost of deploying and operating an AI model, including API fees or hardware costs, engineering time for integration and maintenance, monitoring and governance overhead, and the opportunity cost of model management versus other IT priorities. For banking AI decisions, TCO analysis should span at least a 3-year horizon.

Dimension 3: Compliance

For banking, compliance is not optional -- it is a gating criterion that can eliminate model options entirely:

Data residency: Where is data processed? Can you guarantee geographic boundaries? Cloud APIs may route to servers in any region. VPC and on-premises deployments provide geographic control.

Audit trails: Can you log every prompt, response, and model version for regulatory examination? Enterprise API agreements typically include logging. Self-hosted deployments require building this capability.

Model risk management: Does your selection process satisfy SR 11-7 requirements? Can you document model selection rationale, validation testing, and ongoing monitoring? Open-source models with published training data provide more transparency for MRM documentation.

Data handling: What happens to your data after processing? Is it used for model training? Enterprise agreements should explicitly prohibit training on your data. Self-hosted models eliminate this concern entirely.

Tip

Build a compliance checklist specific to your institution's regulatory requirements and apply it consistently to every model evaluation. This transforms model selection from a subjective technical preference into a structured, auditable governance process -- exactly what examiners expect under SR 11-7.

Dimension 4: Control

Control encompasses customization, vendor dependency, and operational autonomy:

Customization: Can you fine-tune the model on your data? Open-source models offer full fine-tuning. Some proprietary models offer limited fine-tuning. Others offer none.

Vendor dependency: What happens if the vendor changes pricing, deprecates your model version, or modifies data handling terms? Proprietary models create higher vendor dependency. Open-source models you host eliminate vendor dependency for the model itself (but create dependency on your own operations team).

Version control: Can you lock to a specific model version for reproducibility? Regulatory environments may require reproducing prior outputs. Version pinning is essential.

Portability: How difficult is it to switch models if your current choice becomes suboptimal? Architectures that abstract the model behind an API interface make switching easier.

Model Comparison for Banking

Model FamilyCapabilityCost (relative)Compliance ReadinessControl
Claude (Anthropic)Top-tier reasoning, 200K context, strong safety$$$Enterprise agreements, no data training, Bedrock integrationLimited fine-tuning, cloud-only
GPT-4o (OpenAI)Top-tier general capability, multimodal$$$Enterprise agreements, Azure integrationLimited fine-tuning, cloud-only
Command R+ (Cohere)Best-in-class RAG, multilingual$$VPC/on-prem options, data residency controlsModerate fine-tuning options
Llama 3 (Meta)Strong open-source, flexible deployment$ (self-hosted)Full data control, on-premises capableFull fine-tuning, full control
Mistral (Mistral AI)Strong open-source, EU-based$ (self-hosted)GDPR alignment, on-premises capableFull fine-tuning, full control

Warning

This comparison reflects the landscape at publication time. Model capabilities, pricing, and compliance features evolve rapidly. The framework itself is the durable asset -- apply these four dimensions to any new model that enters the market. Do not treat this table as a permanent ranking.

Matching Use Cases to Models

Use CaseData SensitivityComplexityRecommended Approach
Regulatory document analysisMediumHighTop-tier proprietary (Claude, GPT-4o) via enterprise API
Internal policy Q&A (RAG)HighMediumCommand R+ (VPC) or Llama 3 (on-prem) with RAG pipeline
Customer email classificationMediumLowSmall open-source model (Llama 3 8B) on-premises
Credit memo draft generationHighHighTop-tier proprietary via enterprise API, with human review
Market research summarizationLowMediumMid-tier proprietary model via standard API
Code generation for data teamsLowMediumMid-tier model (Claude Sonnet, GPT-4o mini) via API
Customer complaint analysisHighMediumOn-premises open-source model with fine-tuning
Board presentation draftingLowHighTop-tier proprietary model via enterprise API

Building Your Model Strategy

A mature banking AI strategy does not rely on a single model. It builds a tiered architecture:

Tier 1 -- Strategic analysis: Top-tier proprietary models for complex reasoning, regulatory interpretation, and high-stakes analysis. Low volume, high value per query.

Tier 2 -- Operational intelligence: Mid-tier or specialized models (Command R+ for RAG, Sonnet for general tasks) for steady-state business applications. Moderate volume, moderate value per query.

Tier 3 -- High-volume automation: Small, efficient models (open-source, fine-tuned) for classification, routing, extraction, and other high-volume tasks. High volume, lower value per query.

This tiered approach ensures you deploy the right capability at the right cost for each use case, while maintaining compliance and control where they matter most.

BANKING ANALOGY

This tiered model strategy mirrors how your institution manages its investment portfolio. You do not put all assets in a single instrument. You maintain a diversified portfolio: high-conviction, high-return positions (top-tier models for strategic analysis), core holdings for steady returns (mid-tier models for operational use), and efficient, low-cost positions for broad market exposure (small models for high-volume automation). The portfolio is managed holistically, with risk and return balanced across tiers.

Governance Integration

Model selection is not a one-time decision. It is an ongoing governance process:

  1. Initial evaluation: Apply the four-dimension framework to select models for each use case
  2. Validation testing: Test selected models against banking-specific benchmarks before deployment
  3. Ongoing monitoring: Track model performance, cost, and hallucination rates in production
  4. Periodic re-evaluation: Review model selections quarterly as new models and capabilities emerge
  5. Deprecation planning: Maintain migration paths so you can switch models without disrupting operations

Quick Recap

  • The four-dimension framework evaluates models on Capability, Cost, Compliance, and Control -- balancing all four for each use case
  • Match model capability to task complexity -- over-provisioning wastes money, under-provisioning creates risk
  • Total cost of ownership varies dramatically between cloud APIs, managed infrastructure, and self-hosted deployments
  • A mature banking model strategy uses a tiered approach: top-tier for strategic analysis, mid-tier for operations, efficient models for high-volume automation
  • Model selection is an ongoing governance process, not a one-time decision -- integrate it with your Model Risk Management framework

KNOWLEDGE CHECK

A bank needs to classify 2 million customer emails per month by topic. Using the four-dimension framework, which model approach is most appropriate?

Why does this framework recommend different models for regulatory document analysis versus customer email classification?

What is the primary risk of a banking institution relying on a single foundation model for all AI use cases?