Real-time ML for the Bundesliga

Context

The Deutsche Fußball Liga (DFL), which operates the Bundesliga and 2. Bundesliga, runs a digital products group called DFL Digital Sports. They license real-time match data and insights to broadcasters, clubs, and betting partners worldwide.

They wanted to add machine-learned match signals, like expected goals, shot quality, pressing intensity, to the live data feed. The requirement was sub-two-second latency from event on the pitch to signal on the broadcast graphic. At matchday peak, the platform had to score dozens of match events per second across nine concurrent matches, with zero tolerance for dropouts.

The real problem

The stated problem was latency. The actual problem was that the ML team, the data-engineering team, and the broadcast-platform team reported into three different groups, with three different release cadences. Any feature that touched all three stacks took a season to ship.

Real-time ML is not a modeling problem. It is a release-cadence problem dressed up as a modeling problem.

Approach

We built a single deployable boundary around the inference path. Models trained offline shipped as SageMaker inference endpoints, wrapped by a thin Lambda router that the broadcast platform already knew how to call. Model versioning happened behind that boundary. Nothing about the broadcast stack changed when a new model shipped.

On the data side, we replaced the per-team feature-engineering code with a shared Feast-based feature store hooked directly to the live event stream. One source of truth for both training and inference. Feature engineers could ship a new signal on Tuesday and see it on the air by Saturday.

The platform’s cultural contribution was bigger than its architectural one. We moved the release boundary so that a feature change no longer required three-team coordination.

What worked

Latency came in at 1.2 seconds at the 95th percentile for the signals that were in scope. Dropouts during peak matchday stayed under 0.4%.
Fan engagement on products that used the new signals grew by about 30% in-season, measured by time-on-graphic and return visits.
Ship cadence for ML features went from one per season to one per week. The limit became how fast feature engineers could write new ideas, not how fast the stack could ship them.

What did not

The first version of the feature store used the wrong cache granularity. We cached at the feature level; we should have cached at the entity level. This showed up three months in as a slow-burn increase in p99 latency that was hard to diagnose.
We built too much custom tooling around model rollbacks before SageMaker’s own endpoint versioning was mature. Six months later we threw that custom tooling away. If I had been less confident, we would not have written it.

What I would do differently

I would start the feature-store work before the inference-path work. We did it in the other order because the inference path looked like the harder problem. In hindsight, the inference path was mostly well-understood AWS plumbing. The feature store was the place where the team’s domain knowledge needed to live, and starting it later meant it was under-specified for longer.

I would also push harder, earlier, on organizational alignment. The architectural boundary we drew eventually worked. It would have worked faster if the three teams had aligned on it before we shipped the first endpoint, not after the first three incidents forced the conversation.

Outcome

The platform is in its fifth season. It has added six more ML signals beyond the original scope, including two that now ship to betting partners under a separate licensing agreement. Latency at p95 is under 900 ms, below the original target. The annual operating cost of the platform is a fraction of what the replaced patchwork cost, and it takes two engineers to keep it running.

DFL has since extended the contract twice and added broadcast-rights analytics as a new workstream on the same platform.