Model Selection Framework for Banking

intermediate12 min readmodel-selectionframeworkdecision-makingcompliancecost-optimization

From Model Awareness to Model Strategy

You now understand the major foundation modelsFoundation ModelA large AI model trained on broad data that can be adapted to many tasks. Examples include GPT-4, Claude, and Gemini. Banks evaluate these for capabilities, safety, and regulatory fit.See glossary -- their strengths, trade-offs, and deployment options. But understanding individual models is not the same as having a model strategy. This unit provides the framework your institution needs to make structured, defensible model selection decisions for each banking use case.

The framework addresses four dimensions: Capability (can the model do the job?), Cost (at what total cost of ownership?), Compliance (does it satisfy regulatory requirements?), and Control (what level of oversight and customization do you retain?). Every model selection decision requires balancing these four dimensions against the specific requirements of the use case.

The Four-Dimension Framework

Dimension 1: Capability

Not every banking task requires the most powerful model. Match model capability to task complexity:

Task Complexity	Example Banking Tasks	Model Tier
Low	Document classification, email routing, FAQ response	Small/fast models (Claude Haiku, Llama 3 8B, Mistral 7B)
Medium	Document summarization, draft generation, data extraction	Mid-tier models (Claude Sonnet, GPT-4o mini, Llama 3 70B)
High	Regulatory interpretation, complex risk analysis, multi-step reasoning	Top-tier models (Claude Opus, GPT-4o, Command R+ for RAG)

Over-provisioning capability wastes money. A simple email classification task does not need GPT-4o -- a model 10x cheaper will perform equally well. Under-provisioning capability creates risk. A regulatory interpretation task needs the strongest available reasoning.

BANKING ANALOGY

This is the same principle you apply to staffing decisions. You do not assign a Managing Director to process routine wire transfers, and you do not assign a junior analyst to structure a complex syndicated loan. Each task has an appropriate expertise level. AI model selection follows the same logic -- match the model's capability to the task's complexity, and deploy your most capable (and expensive) resources only where they add the most value.

Dimension 2: Cost

Total cost of ownership varies dramatically based on deployment approach:

Per-query pricing (cloud APIs):

Best for: variable workloads, lower volumes, rapid prototyping
Cost driver: token volume. Price typically ranges from $1 to $30 per million tokens depending on model tier
Watch out for: costs scaling linearly with usage -- a successful deployment can become expensive quickly

Managed infrastructure (Together AI, AWS Bedrock):

Best for: steady-state workloads, moderate volumes, when you want model variety without infrastructure management
Cost driver: provisioned capacity. You pay for reserved compute regardless of utilization
Watch out for: over-provisioning capacity during low-demand periods

Self-hosted (on-premises or dedicated cloud):

Best for: high-volume workloads, maximum data control, predictable costs at scale
Cost driver: hardware, operations, talent. Fixed costs regardless of query volume
Watch out for: underestimating the operational burden and talent requirements

KEY TERM

Total Cost of Ownership (TCO): The complete cost of deploying and operating an AI model, including API fees or hardware costs, engineering time for integration and maintenance, monitoring and governance overhead, and the opportunity cost of model management versus other IT priorities. For banking AI decisions, TCO analysis should span at least a 3-year horizon.

Dimension 3: Compliance

For banking, compliance is not optional -- it is a gating criterion that can eliminate model options entirely:

Data residency: Where is data processed? Can you guarantee geographic boundaries? Cloud APIs may route to servers in any region. VPC and on-premises deployments provide geographic control.

Audit trails: Can you log every prompt, response, and model version for regulatory examination? Enterprise API agreements typically include logging. Self-hosted deployments require building this capability.

Model risk management: Does your selection process satisfy SR 11-7 requirements? Can you document model selection rationale, validation testing, and ongoing monitoring? Open-source models with published training data provide more transparency for MRM documentation.

Data handling: What happens to your data after processing? Is it used for model training? Enterprise agreements should explicitly prohibit training on your data. Self-hosted models eliminate this concern entirely.

Tip

Build a compliance checklist specific to your institution's regulatory requirements and apply it consistently to every model evaluation. This transforms model selection from a subjective technical preference into a structured, auditable governance process -- exactly what examiners expect under SR 11-7.

Dimension 4: Control

Control encompasses customization, vendor dependency, and operational autonomy:

Customization: Can you fine-tuneFine-TuningThe process of further training a pre-trained model on a specific dataset to specialize its behavior for a particular domain or task, such as banking compliance language.See glossary the model on your data? Open-source models offer full fine-tuning. Some proprietary models offer limited fine-tuning. Others offer none.

Vendor dependency: What happens if the vendor changes pricing, deprecates your model version, or modifies data handling terms? Proprietary models create higher vendor dependency. Open-source models you host eliminate vendor dependency for the model itself (but create dependency on your own operations team).

Version control: Can you lock to a specific model version for reproducibility? Regulatory environments may require reproducing prior outputs. Version pinning is essential.

Portability: How difficult is it to switch models if your current choice becomes suboptimal? Architectures that abstract the model behind an API interface make switching easier.

Model Comparison for Banking

Model Family	Capability	Cost (relative)	Compliance Readiness	Control
Claude (Anthropic)	Top-tier reasoning, 200K context, strong safety	$$$	Enterprise agreements, no data training, Bedrock integration	Limited fine-tuning, cloud-only
GPT-4o (OpenAI)	Top-tier general capability, multimodal	$$$	Enterprise agreements, Azure integration	Limited fine-tuning, cloud-only
Command R+ (Cohere)	Best-in-class RAG, multilingual	$$	VPC/on-prem options, data residency controls	Moderate fine-tuning options
Llama 3 (Meta)	Strong open-source, flexible deployment	$ (self-hosted)	Full data control, on-premises capable	Full fine-tuning, full control
Mistral (Mistral AI)	Strong open-source, EU-based	$ (self-hosted)	GDPR alignment, on-premises capable	Full fine-tuning, full control

Warning

This comparison reflects the landscape at publication time. Model capabilities, pricing, and compliance features evolve rapidly. The framework itself is the durable asset -- apply these four dimensions to any new model that enters the market. Do not treat this table as a permanent ranking.

Matching Use Cases to Models

Recommended Model Tiers by Banking Use Case

Use Case	Data Sensitivity	Complexity	Recommended Approach
Regulatory document analysis	Medium	High	Top-tier proprietary (Claude, GPT-4o) via enterprise API
Internal policy Q&A (RAG)	High	Medium	Command R+ (VPC) or Llama 3 (on-prem) with RAG pipeline
Customer email classification	Medium	Low	Small open-source model (Llama 3 8B) on-premises
Credit memo draft generation	High	High	Top-tier proprietary via enterprise API, with human review
Market research summarization	Low	Medium	Mid-tier proprietary model via standard API
Code generation for data teams	Low	Medium	Mid-tier model (Claude Sonnet, GPT-4o mini) via API
Customer complaint analysis	High	Medium	On-premises open-source model with fine-tuning
Board presentation drafting	Low	High	Top-tier proprietary model via enterprise API

Building Your Model Strategy

A mature banking AI strategy does not rely on a single model. It builds a tiered architecture:

Tier 1 -- Strategic analysis: Top-tier proprietary models for complex reasoning, regulatory interpretation, and high-stakes analysis. Low volume, high value per query.

Tier 2 -- Operational intelligence: Mid-tier or specialized models (Command R+ for RAG, Sonnet for general tasks) for steady-state business applications. Moderate volume, moderate value per query.

Tier 3 -- High-volume automation: Small, efficient models (open-source, fine-tuned) for classification, routing, extraction, and other high-volume tasks. High volume, lower value per query.

This tiered approach ensures you deploy the right capability at the right cost for each use case, while maintaining compliance and control where they matter most.

BANKING ANALOGY

This tiered model strategy mirrors how your institution manages its investment portfolio. You do not put all assets in a single instrument. You maintain a diversified portfolio: high-conviction, high-return positions (top-tier models for strategic analysis), core holdings for steady returns (mid-tier models for operational use), and efficient, low-cost positions for broad market exposure (small models for high-volume automation). The portfolio is managed holistically, with risk and return balanced across tiers.

Governance Integration

Model selection is not a one-time decision. It is an ongoing governance process:

Initial evaluation: Apply the four-dimension framework to select models for each use case
Validation testing: Test selected models against banking-specific benchmarks before deployment
Ongoing monitoring: Track model performance, cost, and hallucinationHallucinationWhen an AI model generates plausible-sounding but factually incorrect information. A critical risk in banking where inaccurate outputs could lead to regulatory violations or financial losses.See glossary rates in production
Periodic re-evaluation: Review model selections quarterly as new models and capabilities emerge
Deprecation planning: Maintain migration paths so you can switch models without disrupting operations

Quick Recap

The four-dimension framework evaluates models on Capability, Cost, Compliance, and Control -- balancing all four for each use case
Match model capability to task complexity -- over-provisioning wastes money, under-provisioning creates risk
Total cost of ownership varies dramatically between cloud APIs, managed infrastructure, and self-hosted deployments
A mature banking model strategy uses a tiered approach: top-tier for strategic analysis, mid-tier for operations, efficient models for high-volume automation
Model selection is an ongoing governance process, not a one-time decision -- integrate it with your Model Risk Management framework

KNOWLEDGE CHECK

A bank needs to classify 2 million customer emails per month by topic. Using the four-dimension framework, which model approach is most appropriate?

Why does this framework recommend different models for regulatory document analysis versus customer email classification?

What is the primary risk of a banking institution relying on a single foundation model for all AI use cases?