The most common question I hear from engineering leaders right now is some version of “how do we add AI to our platform?” It is a reasonable question. It is also usually the wrong first question.
The right first question is: can your current architecture support AI workloads reliably? Not just technically, but operationally. In practice, most enterprise architectures were not designed with the access patterns, latency profiles, or cost structures that AI workloads require. Adding AI on top of them produces something that works in a demo and degrades in production.
The data layer problem that predates the model question
AI models need data. Not just any data. They need consistent, well-structured, accessible data with clear lineage and governance. Most enterprise architectures were not built with this in mind, because until recently there was no production workload that required it at this granularity.
The specific problems that appear are predictable. Customer data lives in six different databases with six different schemas and no single source of truth, which means every training run requires a custom extraction and reconciliation step that someone has to own and maintain. Event data gets dropped during peak load because the message queue was sized for the original ingestion rate, not the rate the AI consumer needs to process. Historical data exists but has no documented lineage, so the transformations between source and the format your model expects are implicit and fragile.
Before choosing a model, audit your data layer. Three questions: Where does each critical data entity live, who owns it, and how fresh is it? What transformations happen between source and consumption, and are those transformations documented? Is there a schema registry, or are producer-consumer contracts implicit and enforced only by convention?
If the answer to any of these is “we are not sure” or “we would have to check,” the data layer work is the first project.
Events as a prerequisite, not a nice-to-have
The most AI-ready architectures I have worked with share a common trait: they were already event-driven before any AI work started. Not because event-driven is the right architecture for every system, but because events create a natural audit trail, enable real-time processing without polling, and decouple producers from consumers in a way that makes adding a new consumer, like an ML model, genuinely incremental.
In a system where services communicate exclusively through synchronous REST calls, adding an AI consumer means adding integration points that are synchronous by nature. That creates two problems. The first is latency: a synchronous call chain that includes an LLM inference step with a two-second response time is fundamentally different from a chain where the slowest step takes 200 milliseconds. The second is coupling: every service that calls the AI system directly now has a dependency on its availability and response time.
In an event-driven architecture, the ML consumer subscribes to existing events. A fraud detection model can consume the same transaction events as the ledger service. A recommendation engine can listen to the same purchase events as the analytics pipeline. The integration is additive, not structural.
If your services communicate synchronously and you want to add AI capabilities, you have two paths: add the event layer first and do AI second, or accept that your AI integration will require rework when the system grows. Both are valid choices. Make the choice deliberately.
The API patterns that actually work
Most existing APIs were not designed for the latency profiles that AI introduces. An API that normally responds in 150 milliseconds does not gracefully accommodate a feature that adds three seconds of LLM inference to the request path.
Three patterns matter here.
Async-first for anything slow. For AI workloads that take more than half a second, design the integration as: submit a request, receive a job ID, poll or subscribe for the result. This is more work to build and more work to test. It is also the pattern that does not break the user experience when inference runs slow, which it will.
Streaming for generative outputs. If you are building generative AI features, design for streaming from the start. Retrofitting streaming into a synchronous response pattern requires changing every layer between the model and the client. Starting with streaming and adding a synchronous wrapper when you need it is trivially easier.
Explicit fallback paths. Every AI-powered feature needs a defined behavior for when the model is unavailable, slow, or wrong. “Return an error” is a fallback path. “Return the most recent cached result” is a better one for most cases. “Fall back to the rule-based system this model was built to replace” is better still, if the rule-based system still exists.
The fallback path question is useful as a design forcing function. If you cannot answer it, you are not ready to ship the feature.
Cost as a first-class metric from day one
AI workloads introduce cost categories that most teams have not budgeted for or instrumented. GPU inference, embedding generation for vector search, and token-priced LLM calls do not map cleanly to the infrastructure cost models most engineering organizations already use.
The teams deploying AI sustainably are treating inference cost as a first-class operational metric from day one: cost per inference, cost per active user, cost per unit of business outcome. With that instrumentation, you can see when a model update that improves quality by four percent increases inference cost by forty percent, and decide whether that is a trade-off worth making.
Without it, you find out about the cost profile in the monthly bill review, after the behavior is already in production.
The boring conclusion
AI readiness is mostly about getting fundamentals right. Clean data with documented lineage. Reliable events with explicit contracts. API patterns designed for the latency profiles that AI workloads actually have. Cost instrumentation from the first deployment.
None of this is AI-specific. All of it is the foundation of any system that will be operated at production quality for more than a year. The organizations that are doing AI well right now built most of this before they needed it for AI, because they built it to run good software. The ones struggling are discovering that they skipped it.