How to Create an AI Strategy When You Have No Plan Yet
The fastest path to a working AI strategy is a ranked backlog of real business problems, not a vision document. Pick the top problem on that backlog, decide whether to build or buy, ship a working system inside ninety days, and let that first result shape everything that follows.
I am Mahmoud Zalt, an independent senior AI systems architect with 16 years building production software since 2010. I founded Sista AI, where building a production workforce of autonomous agents over the past year taught me to start strategy from a problem, not a mandate, and now work with founding teams and growth-stage companies as a Fractional AI Officer. I have watched dozens of companies attempt AI strategies, and the pattern that kills them is always the same: starting with a mandate instead of a problem. This article gives you the process I use to go from zero to a defensible, scoped, and executable AI plan in under four weeks.
Why the 'AI Everywhere' Mandate Wastes Your First Six Months
When leadership issues a blanket directive to 'add AI to everything,' teams respond by doing three things simultaneously: they run vendor demos, they form a committee, and they prototype ten features at once. None of those ten features ship. Six months later the company has a deck, a few abandoned Jupyter notebooks, and a growing skepticism inside the engineering team.
The structural problem is that an 'AI everywhere' mandate is a solution in search of a problem. You cannot evaluate vendors, estimate costs, or write acceptance criteria until you know precisely which business outcome you are trying to move. Every hour spent on a use case you have not yet validated is pure waste.
There is also a morale cost. Engineers who build throwaway prototypes for six months stop believing in the initiative. The first real win, shipped and measured, is worth more than any number of internal demos. It resets the culture around AI from skeptical to curious, which is the only culture in which serious AI work gets done.
Step 1: Build a Ranked Problem Backlog Before Touching Any Tool
The first artifact in any AI strategy engagement I run is not a technology map. It is a problem backlog. I spend the first week interviewing department heads, operations leads, and customer-facing staff with one consistent question: 'What task do you do repeatedly that takes more time than it should, or where you make decisions with incomplete information?'
Every answer goes into a spreadsheet with four columns:
- Problem statement: written as a measurable gap ('support tier-1 resolution takes 22 minutes average; industry benchmark is 8 minutes')
- Business value: annual cost of the gap in dollars, hours, or churn percentage
- Data availability: is the required data already captured, structured, and accessible, or does it need work?
- AI fit: is this a pattern-matching problem (high AI fit), a rules problem (low AI fit), or a judgment problem that needs a human in the loop?
After one week of interviews you typically have fifteen to thirty problems. You score each one on value and feasibility. The top three to five items on that ranked list become your strategy. Everything else is a parking lot, reviewed quarterly.
A concrete example: a 60-person SaaS company I worked with identified 27 candidate use cases. The top item on the ranked list was not an LLM chatbot. It was automated classification of inbound support tickets into billing, technical, and feature-request categories, with routing to the right queue. It had a clean training set (four years of resolved tickets), a clear success metric (first-response SLA), and a six-figure annual value if resolution time dropped by half. We shipped a working classifier in six weeks. That result funded the next three projects politically and financially.
Step 2: Apply the Build-vs-Buy Filter Before Any Architecture Decision
Once you have a ranked problem, the next decision is not which model to use. It is whether to build at all. Most teams default to building because it feels like ownership. Most of the time that is the wrong call for a first project.
I use a four-question filter:
| Question | Build signal | Buy signal |
|---|---|---|
| Is this core competitive differentiation? | Yes, it is a moat | No, it is infrastructure |
| Does existing tooling cover 80% of the use case? | No | Yes, with reasonable config |
| Do you have the data and the team to maintain a custom model? | Yes, both | No, either one |
| What is the switching cost if the vendor changes pricing or quality? | Low enough to survive | Unacceptably high |
For the ticket classifier example above, the answer was 'buy with light customization.' We used a hosted classification API, fine-tuned on the company's own ticket history via a small adapter, and wrapped it in a thin service the team owned. Total custom code: under 400 lines. Vendor lock-in risk: low, because the training data and the integration logic lived in the company's own repository and the model could be swapped to an open-weight alternative in a sprint.
Where I recommend building: when the use case requires proprietary retrieval over internal documents (RAG over your own knowledge base), when the latency or cost profile of hosted models is incompatible with your workload, or when the output quality of general models is materially worse than a fine-tuned specialist. Even then, 'build' usually means 'build the application layer on top of an existing model,' not 'train from scratch.'
Step 3: Write a Scoped 90-Day Roadmap, Not a Three-Year Vision
A three-year AI vision document is not a strategy. It is a request for patience. Nobody can evaluate it, nobody can execute against it, and it will be wrong by month four when the model landscape shifts again.
A 90-day roadmap has three properties that make it useful. First, it is short enough that the underlying model capabilities will not change so drastically as to invalidate the plan. Second, it forces a single primary outcome per cycle, which creates accountability. Third, it is long enough to actually ship something through design, integration, evaluation, and production hardening.
The structure I use:
- Days 1-14: foundation. Data audit, tooling selection, environment setup, eval framework defined. The eval framework is non-negotiable. You need baseline numbers before you ship anything so you can prove whether the system is working.
- Days 15-45: first working version. The system runs end-to-end in a staging environment. Evals run on a held-out test set. You know your precision, recall, or task-completion rate before a single user touches it.
- Days 46-70: production hardening. Guardrails added (input sanitization, output validation, rate limiting, cost caps). Observability wired (trace every LLM call: prompt, response, latency, token count, model version). Human-in-the-loop review queue for low-confidence outputs.
- Days 71-90: measured rollout. Shadow mode or limited rollout. Compare against baseline. Document what broke and why. Decide whether to scale, iterate, or stop.
That last decision, 'stop,' has to be on the table. The most expensive AI project is one that continues past the point where the data shows it is not working.
Evals, Guardrails, and Observability: The Infrastructure You Cannot Skip
Teams building their first AI system almost always underinvest in three areas: evals, guardrails, and observability. They treat them as post-launch polish. They are not. They are the foundation that determines whether you can trust the system, debug it, and improve it.
Evals are your testing framework for AI behavior. For a classification task, that means a labeled test set you never train on, with pass/fail thresholds defined before you ship ('precision must exceed 0.88 or we do not launch'). For a generation task (summaries, drafts, answers), you need at minimum: factual accuracy checks against a reference set, format compliance checks, and a sample of human-reviewed outputs scored against a rubric. LLM-as-judge pipelines (using a separate model to score outputs at scale) are a reasonable complement but not a replacement for a human-reviewed gold set.
Guardrails mean validating inputs and outputs at the application boundary, not trusting the model to self-limit. For a customer-facing LLM: block prompt injection patterns at the input layer, validate that outputs match an expected schema before rendering them in the UI, set hard token limits, and route anything the model flags as uncertain to a human review queue. For tool-calling or MCP-based agents, require explicit approval for any action that writes data, spends money, or sends a message outside the organization.
Observability means logging every LLM call with enough context to reproduce and debug it: the exact prompt template version, the model and version, the full response, latency, token counts, and the downstream action taken. I use structured logs (JSON) with a correlation ID so I can trace a single user interaction across the entire chain. Cost attribution per use case is also mandatory: you need to know which workflow is consuming 80% of your inference budget within the first month of production traffic, or you will get a surprise invoice.
Retrieval, Tool-Calling, and When Agents Are Actually Worth It
Two capabilities come up in almost every AI strategy conversation: RAG (retrieval-augmented generation) and agents (tool-calling systems). Both are real and useful. Both are also over-applied.
RAG is the right pattern when your use case requires the model to answer questions about documents that were not in its training data, that change frequently, or that are proprietary. A support bot that answers questions about your product's configuration options is a good RAG candidate. A chatbot that handles general FAQ that any model already knows is not. The failure mode I see constantly is teams standing up a vector database and chunking every document in the company on day one, before they have verified that retrieval quality is actually the bottleneck. Start with a small, curated document set for your target use case. Measure retrieval precision. Add breadth only when quality is validated.
Tool-calling and MCP-based agents are worth the complexity when the task genuinely requires taking actions across multiple systems: reading a CRM record, calling an API, updating a row, then sending a summary. They are not worth the complexity for single-step lookups or for tasks where a deterministic script would do the same job reliably. My rule: if you can specify the full logic as a decision tree, write the decision tree. Reach for an agent when the branching is too dynamic or contextual to enumerate in advance.
Human-in-the-loop is not a weakness in an agentic system. For any agent that takes irreversible actions, a human approval step for low-confidence or high-stakes operations is an architectural requirement, not a fallback. Design it in from the start.
Cost and Security: Two Things That Blow Up AI Projects in Production
Cost. LLM inference costs are non-trivial at scale and highly variable depending on prompt length, model selection, and call frequency. The teams that get burned are those who prototype with GPT-4-class models, measure acceptable quality, and then discover the production cost at their actual request volume is 40 times their budget. Cost management is part of the architecture, not a finance problem. Use the smallest model that meets your quality bar, measure that bar with evals, and document the quality-cost tradeoff explicitly. Cache deterministic outputs aggressively. Set hard monthly spending caps with alerting before you hit them.
Security. The security surface for an AI system is different from a traditional application, but the discipline is the same: validate inputs, never trust external data, and treat the model's output as untrusted until it has been validated against your schema and your policy. Specific risks: prompt injection (an attacker supplying input designed to override your system prompt or exfiltrate data), training data exposure (the model revealing information it was fine-tuned on), and insecure tool invocation (an agent being manipulated into calling a destructive API endpoint). For customer-facing systems, run a red-team exercise before launch. It does not need to be elaborate, just two hours with a few people who are adversarially creative.
On data privacy: if your use case involves customer PII, medical records, or financial data, the model provider's data processing terms, your data retention policy, and your legal team's sign-off are not optional. Resolve them before you write a line of production code.
What Teams Most Often Get Wrong the First Time
After running this process across multiple companies, the failure modes cluster into five categories:
- Starting with the model instead of the problem. 'We want to use GPT-4o' is not a strategy. The model choice follows from the problem requirements, not the other way around.
- No eval framework before launch. The team ships, traffic comes in, and they have no idea whether the system is performing well or poorly. They are flying blind. Every meaningful improvement after that point is a guess.
- Underestimating data quality work. The most common reason a first AI project takes twice as long as estimated is not the model integration. It is discovering that the training or retrieval data is inconsistently formatted, incomplete, or stored in a system that requires six weeks of access negotiation to reach.
- Building an agent when a script would do. Agents are cool. They are also non-deterministic, harder to test, and more expensive to run. If your use case has a predictable flow, deterministic logic is faster to build, cheaper to operate, and easier to debug.
- Skipping the human-in-the-loop for 'efficiency.' The value of the human review queue is not just catching errors. It is generating the labeled data you need to improve the system. Remove the queue and you lose your feedback loop.
Frequently Asked Questions
how do I create an AI strategy for a company that has no AI plan yet
Start with a one-week listening exercise across department heads and operations staff, collecting every repeated task or incomplete-information problem they face. Score those problems by business value and data availability. The top three to five items become your strategy. Pick one, define a measurable outcome, decide build vs. buy in a single meeting, and ship a working system inside ninety days. That first shipped result is your strategy proof-of-concept and it will shape every decision that follows.
how long does it take to build an AI strategy from scratch
A defensible, scoped, executable AI strategy for a company starting from zero takes three to four weeks to produce: one week of problem discovery interviews, one week of scoring and filtering, one week to write the 90-day roadmap with an eval framework and a build-vs-buy recommendation per use case. The strategy document itself should be short enough to fit on two pages. If it is longer, it is not a strategy, it is a research paper.
do I need a dedicated AI team or can existing engineers do this
For a first use case, existing engineers with API integration experience and good software discipline can deliver more value than a dedicated AI team hired cold. The skills gap is usually in evals and observability, not in model calling. A fractional AI advisor who has done this before can close that gap in days, not months, by providing the eval framework, the guardrail patterns, and the architectural guardrails your team needs to avoid the common pitfalls.
what is the difference between an AI strategy and an AI roadmap
The strategy answers 'which problems, in which order, build or buy, and what does success look like.' The roadmap answers 'who does what by when.' You need both. The mistake most teams make is skipping the strategy and going straight to a roadmap, which means the roadmap is built on unvalidated assumptions about which use cases are worth pursuing.
how do I get leadership buy-in for an AI initiative
Ship one thing that moves a number they care about, measure it, and present the result in one slide. Leadership buy-in for AI initiatives is almost never won by a vision deck. It is won by a working system with a before-and-after metric. That is why the first 90-day cycle is the most important investment you will make.
when should a company hire a Fractional AI Officer instead of a full-time AI lead
When the company needs senior AI systems judgment immediately but does not yet have a validated roadmap, a defined team structure, or the recurring workload to justify a full-time hire at a senior level. A Fractional AI Officer compresses the strategy phase, avoids the six-month recruiting cycle, and gives you an experienced operator who has made these mistakes before, on other companies' time and money.
Work With Me on Your AI Strategy
If your company is past the 'should we do AI' conversation and into 'how do we actually start without wasting the next six months,' that is exactly the work I do as a Fractional AI Officer. I come in for a defined engagement, run the problem backlog process, set the build-vs-buy filters, design the eval and observability infrastructure, and get your team shipping a working system before the engagement ends. No six-month retainer required to find out whether this is useful.
You can read more about my background and the systems I have built on the about page and in the projects section. If you want to discuss where your company is and whether this approach fits, the contact page is the fastest way to reach me.
Explore the Fractional AI Officer engagement






