Analytics-Adjacent AI on Data Platform

intermediate15 min readreference-architecturedata-platformanalyticslakehousefeature-store

Overview

Most banks have spent the last decade building modern data platforms. Data warehouses on Snowflake, data lakehouses on Databricks, BI dashboards in Tableau or Power BI, and data governance frameworks that comply with regulatory requirements. This infrastructure represents a massive investment -- not just in technology, but in data quality, governance processes, and institutional knowledge.

Analytics-adjacent AI takes a pragmatic approach: instead of building a separate AI platform and moving data to it, you layer AI capabilities directly onto your existing data infrastructure. The AI models run where your data already lives, governed by the same access controls, and producing outputs that feed into the same dashboards and reports your stakeholders already use.

This architecture is particularly compelling for banking because it minimizes incremental governance burden. Your data is already governed, your access controls are already audited, and your BI tools are already trusted by the business. Adding AI as a layer on top of this foundation is fundamentally less risky than introducing an entirely new data movement pattern.

BANKING ANALOGY

Think of analytics-adjacent AI the way you think about adding online banking to your existing core banking system in the late 1990s. The smart banks did not rip out their cores and rebuild from scratch -- they added a digital channel layer on top of the existing account processing, transaction engines, and general ledger. The core remained the system of record; online banking was the new capability layer. Analytics-adjacent AI follows the same pattern: your data platform remains the system of record, and AI becomes a new capability layer that enriches analysis without disrupting what already works.

Architecture Components

Loading diagram...

Data Lakehouse Layer

The data lakehouse combines the structured query capabilities of a data warehouse with the flexibility of a data lake. In banking, this means your structured data (transaction records, financial metrics, customer demographics) and unstructured data (loan documents, emails, call transcripts, market research reports) coexist in one platform.

For analytics-adjacent AI, the lakehouse is critical because AI applications frequently need both structured and unstructured data. A credit risk model that combines traditional financial ratios with sentiment analysis of earnings call transcripts needs both data types in a single query environment.

Platforms like Snowflake and Databricks serve as the lakehouse layer, with Delta Lake or Apache Iceberg providing the open table format that ensures data portability.

Feature Store

The feature store is a centralized repository of curated, versioned data features used by AI models. In banking, features might include: customer lifetime value scores, transaction velocity metrics, account concentration ratios, and text embeddingsEmbeddingsNumerical representations (vectors) of text that capture semantic meaning. Similar concepts produce vectors that are close together, enabling machines to understand relationships between words, sentences, or documents.See glossary of customer communications.

Feature stores serve two critical functions. First, they ensure consistency -- the same feature definition is used in model training and production inferenceInferenceThe process of running a trained model to generate predictions or outputs from new input data. Inference cost, latency, and throughput are key factors in enterprise AI deployment.See glossary, eliminating the "training-serving skew" that causes models to behave differently in production than in development. Second, they enable feature reuse -- a customer risk score computed for credit underwriting can also be used by the marketing team for campaign targeting, without redundant computation.

Databricks Feature Store, Feast (open-source), and Tecton are the leading options. For banks already on Databricks, the integrated Feature Store minimizes additional infrastructure.

Model Serving Layer

The model serving layer hosts trained AI models and exposes them as API endpoints that other systems can call. This includes both traditional ML models (credit scoring, fraud detection, churn prediction) and foundation modelFoundation ModelA large AI model trained on broad data that can be adapted to many tasks. Examples include GPT-4, Claude, and Gemini. Banks evaluate these for capabilities, safety, and regulatory fit.See glossary capabilities (document summarization, natural language queries, classification).

Key capabilities include: auto-scaling to handle variable load, A/B testing to compare model versions, canary deployments to gradually roll out new models, and model monitoring to detect performance degradation over time.

For banking, the serving layer must also enforce access controls (which applications and users can call which models) and rate limiting (preventing a runaway process from consuming all inference capacity).

AI-Powered Analytics Layer

This layer integrates AI outputs into the analytics workflows stakeholders already use. Instead of building a separate AI application, model predictions and insights appear as new columns in existing tables, new visualizations in existing dashboards, and new capabilities in existing BI tools.

Examples in banking: a credit portfolio dashboard that now includes an AI-generated risk trend narrative alongside traditional charts; a customer segmentation report that incorporates LLM-analyzed behavioral patterns; a compliance monitoring dashboard with AI-flagged transaction anomalies appearing alongside rule-based alerts.

Governance and Lineage Layer

The governance layer tracks the complete lineage of AI-generated outputs: which data was used, which model version produced the output, when it was generated, and what access controls apply. This is essential for model risk managementModel Risk ManagementThe regulatory framework (OCC SR 11-7) governing how banks validate, monitor, and control AI models. Ensures models perform as expected and risks are identified and mitigated.See glossary under OCC SR 11-7 and for responding to regulatory inquiries about any AI-influenced decision.

Fine-tunedFine-TuningThe process of further training a pre-trained model on a specific dataset to specialize its behavior for a particular domain or task, such as banking compliance language.See glossary models have additional lineage requirements: what training data was used, what hyperparameters were set, what evaluation metrics were achieved, and what validation was performed before production deployment.

Unity Catalog (Databricks), Snowflake Horizon, and Apache Atlas are governance platforms that can provide this lineage tracking within the data platform.

Data Flow

Data ingestion: Structured data (transaction records, financial statements, market data) and unstructured data (documents, emails, call transcripts) are ingested into the lakehouse through existing ETL/ELT pipelines
Feature engineering: Raw data is transformed into model-ready features -- customer risk scores, transaction velocity metrics, document embeddings -- and stored in the feature store with versioning and metadata
Model training: Data scientists train models using features from the feature store, with experiment tracking (MLflow) recording every training run, hyperparameter set, and evaluation result
Model deployment: Validated models are deployed to the serving layer as API endpoints, with A/B testing and canary deployment controls managing the rollout
Inference execution: Analytics pipelines call deployed models -- either in batch (scoring the entire loan portfolio overnight) or in real-time (scoring individual transactions as they occur)
Output integration: Model predictions and AI-generated insights are written back to the lakehouse as new columns or tables, making them available through standard SQL queries and existing BI tools
Dashboard surfacing: Existing dashboards and reports incorporate AI outputs alongside traditional analytics -- portfolio risk trends with AI narrative, customer segments with behavioral analysis, compliance flags with confidence scores
Governance tracking: Every model prediction is recorded with full lineage: input features, model version, timestamp, and output. Governance dashboards monitor model performance, data drift, and access patterns

Banking Use Case

Scenario: A regional bank's credit risk team wants to identify emerging portfolio risks earlier. Their current process relies on quarterly financial statement reviews and lagging indicators. They want to incorporate unstructured data signals -- news sentiment about borrower industries, earnings call analysis, and market trend data -- into their risk assessment process.

Without analytics-adjacent AI: The credit risk team manually monitors industry news, reads analyst reports, and relies on quarterly financial statement updates. By the time deteriorating conditions appear in financial statements, the risk is already materialized. The team discusses "leading indicators" in committee meetings but lacks a systematic way to incorporate them.

With analytics-adjacent AI: The lakehouse ingests industry news feeds, earnings call transcripts, and market data alongside traditional financial metrics. Foundation model capabilities generate sentiment scores and topic classifications for unstructured content. The feature store combines these AI-generated signals with traditional risk metrics (debt service coverage ratios, concentration levels, payment trends) into enriched risk features.

A monitoring model consumes these enriched features and produces daily portfolio risk scores with explainability -- "Jones Manufacturing risk score increased 15 points due to: negative earnings call sentiment in the aerospace supply chain sector (contributing factor: 40%), declining payment velocity over the last 60 days (contributing factor: 35%), and CRE appraisal decline in their operating region (contributing factor: 25%)."

These scores appear on the existing credit portfolio dashboard that the risk team already reviews daily. No new application to learn, no new data to manage -- just richer, more timely risk intelligence layered onto the existing workflow.

Tip

The fastest path to analytics-adjacent AI is adding LLM-powered natural language capabilities to your existing BI environment. Both Snowflake Cortex Analyst and Databricks AI/BI Genie allow business users to query data in plain English. This delivers immediate, visible value without model training, feature engineering, or infrastructure changes -- and builds organizational confidence in AI before more complex use cases are pursued.

Key Architectural Decisions

Decision	Options	Recommendation	Why
AI integration point	Separate AI application consuming data exports; AI layer within the data platform; hybrid with some AI on-platform and some standalone	AI layer within the data platform	Keeping AI processing within the data platform eliminates data movement, extends existing governance, and makes AI outputs available through standard SQL queries
Vector storageVector DatabaseA specialized database optimized for storing and querying high-dimensional vectors (embeddings). Enables fast similarity search across millions of documents for RAG and recommendation systems.See glossary	Standalone vector database (Pinecone, Weaviate); platform-native vector capabilities (Snowflake Cortex Search, Databricks Vector Search); PostgreSQL pgvector	Platform-native vector capabilities	Using the data platform's native vector support avoids introducing another service to manage. Embeddings live alongside structured data with unified governance
Batch vs. real-time inference	All batch (overnight scoring); all real-time (per-transaction scoring); hybrid based on use case latency requirements	Hybrid based on use case	Portfolio risk scoring can run overnight (batch). Transaction fraud detection must run in milliseconds (real-time). Matching inference pattern to use case requirements optimizes cost and complexity
Model governance tool	Custom-built model registry; MLflow (open-source); platform-native governance (Unity Catalog, Snowflake Horizon)	Platform-native governance	Platform-native tools integrate lineage tracking, access controls, and model versioning with the same governance infrastructure used for data -- one compliance framework instead of two

Quick Recap

Analytics-adjacent AI layers AI capabilities onto existing data platform infrastructure rather than building a separate AI system
The architecture consists of five layers: data lakehouse, feature store, model serving, AI-powered analytics, and governance/lineage
This approach minimizes incremental governance burden by keeping AI processing within the same security and compliance boundary as existing data
AI outputs integrate into existing dashboards and BI tools, requiring no new applications for business stakeholders
Start with natural language query capabilities (Cortex Analyst, AI/BI Genie) for immediate visible value before pursuing custom model development

KNOWLEDGE CHECK

What is the PRIMARY advantage of analytics-adjacent AI for banks with established data platforms?

Why is a feature store important for banking AI deployments?

A bank wants to add AI-powered risk signals to their existing credit portfolio dashboard. Which architecture approach best serves this goal?