We put excellence, value and quality above all - and it shows




A Technology Partnership That Goes Beyond Code

“Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
From Feasibility to Functionality: Running a Successful Generative AI Readiness Assessment

If you later read "Architecting AI-First Business Processes: Technical Considerations in Generative AI Strategy" or "Choosing the Right Generative AI Framework: A Technical Comparison of OpenAI, Anthropic, and Stability AI APIs," it helps to begin here because both topics depend on the same prior questions about workflow fit, governance boundaries, and operating constraints.[4][5]
Executive Summary
Most generative AI programs fail before deployment, not because the model is weak, but because the organization has not validated workflow fit, data quality, governance, or operating readiness. A readiness assessment should answer one business question with technical evidence: can we move from experimentation to repeatable production value?
NIST frames AI risk management around four operating functions: Govern, Map, Measure, and Manage.[1][2] Its Generative AI Profile extends that foundation for generative workloads, explicitly focusing on the distinct risks of GenAI systems and the actions organizations should take to manage them.[1][3] That is a useful structure because it prevents the common mistake of treating a GenAI pilot as only a model-selection exercise.
For executives, the assessment should produce an investment decision, a prioritized use-case portfolio, and a risk posture. For engineers, it should produce architectural constraints, evaluation criteria, integration requirements, security controls, and operational ownership.
What a Readiness Assessment Should Actually Prove
A serious readiness assessment does not ask whether employees like a chatbot demo. It asks whether a target workflow can be improved with acceptable risk, cost, latency, and operational complexity.
The assessment should prove six things:
- There is a measurable business outcome worth improving.
- The workflow contains reasoning or content-generation steps that are suitable for probabilistic systems.
- The organization can ground the model with current, authorized enterprise context.
- The platform can enforce security, observability, evaluation, and rollback.
- Human decision rights remain explicit where the cost of error is high.
- A team exists that can own the system after the pilot ends.
If any of those remain unresolved, the right decision is usually not "launch later." It is "narrow the scope until the system becomes governable."
The Seven Assessment Domains
1. Business Value and Workflow Fit
Start with a workflow, not with a model. Good candidates usually contain one or more of the following:
- High cognitive load and low physical complexity
- Repetitive knowledge retrieval or synthesis
- Large documentation surfaces
- Many handoffs caused by missing context
- Long cycle times caused by drafting, review, or triage
Poor candidates usually involve hard real-time control, strict deterministic correctness, or decisions that are regulated but currently undocumented.
The first deliverable should be a ranked use-case list with clear baseline metrics such as cycle time, cost per transaction, first-response time, escalation rate, analyst hours saved, or content throughput.
2. Process Design and Human Work Allocation
NIST notes that AI risk management requires a broad set of actors across the AI lifecycle and that AI risks differ from traditional software risks.[2] In practice, that means the assessment must examine who reviews outputs, who overrides them, who owns failure, and what happens when the model is uncertain.
For each workflow, define:
- Which steps remain human-owned
- Which steps can be model-assisted
- Which steps can be model-executed under policy
- What confidence or business rules trigger escalation
- What evidence must be stored for audit or review
This step matters because many failed pilots automate text generation but leave the approval path unchanged, which adds cost without reducing cycle time.
3. Data and Knowledge Readiness
Generative AI systems are only as useful as the context you can safely provide them. Data readiness is not just a vector database question. It includes:
- Source system quality
- Access controls and entitlements
- Document freshness and duplication
- Metadata quality
- Content segmentation strategy
- PII, PHI, IP, and regulated content handling
- Citation and provenance expectations
Ask the engineering team to prove that the model can retrieve the right context for at least twenty representative tasks. If retrieval quality is weak, the problem is usually content architecture, metadata, or permissions, not prompt wording.
4. Model and Evaluation Readiness
The NIST AI RMF emphasizes measurement as a core function.[2] In a GenAI program, that means you need an evaluation system before you need a scaling plan.
Assessment questions should include:
- What constitutes a good answer for this workflow?
- Can quality be judged with deterministic tests, human review rubrics, or model-based graders?
- How will hallucination, omission, policy violation, and unsafe tool use be detected?
- What is the acceptable error budget for the workflow?
- How will regression be detected when prompts, retrieval logic, or models change?
A lightweight but real evaluation harness should include:
- A gold set of representative tasks
- Expected answer characteristics or reference outputs
- Rubrics for groundedness, completeness, policy compliance, and task success
- Pass-fail thresholds tied to business impact
Without this, the pilot becomes a subjective demo process and every stakeholder forms a different view of quality.
5. Platform and Integration Readiness
This is the domain where platform engineers, DevOps, and AI engineers usually uncover the real blockers. A production-capable GenAI platform needs more than an API key.
Minimum questions:
- How will identity propagate from user to model call?
- Where will prompts, policies, and tool schemas be versioned?
- How will secrets be managed?
- What logging is permitted and where will redaction occur?
- What are the latency budgets for retrieval, model inference, tool execution, and post-processing?
- How will rate limits, retries, fallbacks, and circuit breakers be handled?
- How will environments differ across development, test, and production?
If the use case depends on business actions, also define the tool boundary clearly. Models should propose structured actions; deterministic services should execute them.
6. Risk, Security, and Compliance Readiness
NIST states that the AI RMF is intended to help organizations incorporate trustworthiness considerations into the design, development, use, and evaluation of AI systems.[1] For GenAI specifically, NIST's profile is intended to help organizations identify GenAI-specific risks and choose actions aligned to their goals and priorities.[1][3]
Translate that into concrete controls:
- Prompt injection and indirect instruction handling
- Data leakage prevention
- Tenant isolation
- Output moderation and policy screening
- Role-based tool access
- Human approval gates for consequential actions
- Retention and deletion policies
- Incident response ownership
Executives should insist on a short risk register before approving broader rollout. Engineers should insist that every risk has an owner, a detector, and a mitigation path.
7. Operating Model and Skills Readiness
Many organizations have enough budget for a pilot but no team for production. Readiness depends on whether you can establish a durable operating model across product, security, platform, and domain operations.
You need named owners for:
- Use-case portfolio prioritization
- Prompt and workflow design
- Retrieval and knowledge quality
- Evaluation and release approval
- Platform reliability and cost control
- Responsible AI governance
- End-user training and feedback capture
This is the point where your own background matters. A senior Java, Python, and AI/ML engineer who also understands Copilot, VS Code, and AI Foundry can bridge product ambition with engineering reality. That combination is especially useful in readiness work because it reduces the gap between executive strategy and implementation detail.
A Practical Assessment Sequence
Use a four-stage sequence.
Stage 1: Frame the Decision
Define the business outcome, workflow boundary, risk tolerance, and the decision you want to make at the end of the assessment.
Example decision statements:
- "Proceed to a production pilot for customer-support knowledge drafting."
- "Do not proceed until permissions-aware retrieval is available."
- "Proceed only for internal analyst assistance, not for customer-facing automation."
Stage 2: Baseline the Current Workflow
Capture the current-state process in enough detail to measure improvement later:
- Trigger
- Inputs
- Systems touched
- Human roles
- Decision points
- Outputs
- Failure points
- Metrics
Do not skip this step. If you cannot describe the current process clearly, you cannot prove improvement.
Stage 3: Build a Narrow Working Prototype
The prototype should test one bounded workflow with real enterprise context and realistic permissions. It should include:
- Retrieval or context loading
- Prompt and policy logic
- At least one evaluation loop
- Logging and traceability
- A simple human-review step for high-risk cases
Keep the scope narrow enough that a failed prototype teaches you something useful within two to four weeks.
Stage 4: Score and Decide
Score each use case across the seven domains using a simple rubric such as Green, Yellow, or Red.
Suggested scoring questions:
| Domain | Green looks like | Yellow looks like | Red looks like |
| Business value | Clear KPI and sponsor | Useful but not tied to metric | Novelty project |
| Process fit | Clear assistive or agentic boundary | Human workflow unclear | Workflow unsuitable |
| Data | Authorized, fresh, structured enough | Retrieval possible with cleanup | Data access or quality broken |
| Evaluation | Gold set and thresholds exist | Rubric exists but weak coverage | Demo-only judgment |
| Platform | Secure path to production is known | Some controls missing | No viable production path |
| Risk | Risks documented with controls | Controls partial | Unknown or unowned risks |
| Operating model | Named owners and budget | Temporary staffing only | No durable ownership |
The output should be one of four decisions:
- Stop
- Re-scope
- Pilot
- Scale
Common Failure Modes
The most common readiness mistakes are predictable:
- Starting with a broad enterprise chatbot instead of a specific workflow
- Confusing model quality with system quality
- Ignoring retrieval and permissions until late in the project
- Running pilots without evaluation baselines
- Treating governance as an approval step instead of a design input
- Assuming human review fixes a broken process automatically
- Funding experimentation without funding operational ownership
What the Final Assessment Package Should Contain
If the assessment is complete, it should produce a package that an executive steering group and an engineering team can both use:
- Prioritized use-case portfolio
- Current-state and target-state workflow map
- Readiness scorecard across the seven domains
- Technical architecture sketch
- Evaluation plan and sample benchmark set
- Risk register with owners
- Cost and latency assumptions
- Recommendation: stop, re-scope, pilot, or scale
Closing View
The real purpose of a readiness assessment is not to prove that generative AI is exciting. It is to prove that a specific business outcome can be improved with controlled risk and repeatable operations. If you do that rigorously, feasibility turns into functionality. If you skip it, functionality turns back into theater.
That sequencing matters. If you have read "Architecting AI-First Business Processes: Technical Considerations in Generative AI Strategy," you should already know that workflow redesign only becomes credible after readiness work has identified which use cases are viable, governable, and measurable.[4]
References
[1] NIST, "AI Risk Management Framework," https://www.nist.gov/itl/ai-risk-management-framework
[2] NIST AI RMF Knowledge Base, "AI Risk Management Framework," https://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF
[3] NIST, "Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile," https://doi.org/10.6028/NIST.AI.600-1
[4] Series Part 2, "Architecting AI-First Business Processes: Technical Considerations in Generative AI Strategy," ./ai-first-business-processes-strategy.md
[5] Series Part 3, "Choosing the Right Generative AI Framework: A Technical Comparison of OpenAI, Anthropic, and Stability AI APIs," ./generative-ai-framework-comparison-openai-anthropic-stability.md















