Every week I talk to engineering leaders who want to “add AI” to their platform. They’re thinking about models, frameworks, and which LLM provider to use. Those are all valid questions. But they’re the wrong first questions.
The first question is: can your current architecture support AI workloads at all?
The data layer problem
AI models need data. Not just any data. They need consistent, well-structured, accessible data with clear lineage and governance. Most enterprise architectures weren’t designed with this in mind.
If your customer data lives in six different databases with six different schemas and no single source of truth, no model is going to save you. If your event data gets dropped during peak load because your message queue wasn’t sized for it, your real-time ML pipeline is useless.
Before choosing a model, audit your data layer. Can you answer these questions:
Where does each critical data entity live? Who owns it? How fresh is it? What transformations happen between source and consumption? Is there a schema registry or are contracts implicit?
If you can’t answer these confidently, start here.
Events as a foundation
The most AI-ready architectures I’ve worked with share a common trait: they’re event-driven. Not because event-driven is trendy, but because events create a natural audit trail, enable real-time processing, and decouple producers from consumers.
When you have a well-designed event backbone, adding an ML consumer is just another subscriber. Your recommendation engine can listen to the same purchase events as your analytics pipeline. Your fraud detection model can process the same transaction events as your ledger service.
If your services communicate exclusively through synchronous REST calls with no event layer, adding AI will require rewriting integration points. Build the event infrastructure first.
API design for inference
Most existing APIs weren’t designed for the latency patterns that AI introduces. A synchronous API that calls an LLM with a 3-second response time breaks user experience assumptions built around 200ms responses.
Design your AI integration points with these patterns in mind:
Async-first. Submit a request, get a job ID, poll or subscribe for results. This works for any inference that takes more than a second.
Streaming. For generative AI use cases, stream tokens as they’re produced rather than waiting for complete responses.
Fallback paths. What happens when the model is slow, wrong, or unavailable? Every AI-powered feature needs a graceful degradation path.
Caching. Many AI queries are semantically similar. Embedding-based caching can dramatically reduce inference costs and latency for repeated patterns.
Cost architecture
AI workloads can be expensive. GPU inference, embedding generation, and vector database queries add cost categories that most teams haven’t budgeted for.
Build cost observability into your AI architecture from the start. Track cost per inference, cost per user session, cost per feature. Set up alerts before you get a surprise bill.
The organizations deploying AI sustainably are the ones that treat inference cost as a first-class metric alongside latency and error rate.
Start with the boring stuff
The unglamorous truth is that AI readiness is mostly about getting your fundamentals right. Clean data. Reliable events. Well-designed APIs. Cost visibility. These aren’t AI-specific capabilities. They’re the foundations of any good architecture.
Get these right, and adding AI becomes an incremental capability rather than a transformational risk.