Methodology · a primer

How an assessment is actually produced.

Investors are right to be sceptical of AI-generated readiness scores. This page lays out exactly how a run is constructed, what we measure with confidence, what we measure as advisory, and what we don't touch at all. Read it before you read the report.

Pipeline

Four phases. Auditable end-to-end.

  1. Phase 01

    Collect

    A read-only GitHub connector (Personal Access Token, scoped to read) walks the repo tree at any depth, pulls relevant files, CI configs, branch-protection rules, dependency manifests, and metadata. Connector credentials are AES-256-GCM encrypted at rest, zeroed on revoke, and never returned in API responses.

    Read-only. Revocable. Never cloned.
  2. Phase 02

    Evaluate

    100+ deterministic rules across four frameworks evaluate the collected evidence. Pass / fail / unknown. Never a hallucination. Same evidence in, same findings out, every run.

    Reproducible. Auditable. Pure functions.
  3. Phase 03

    Reason

    30 AI specialist agents read the evidence and produce qualitative findings: architecture critique, IP provenance, ops maturity, support readiness. Each output is schema-validated before persistence; malformed responses become a low-confidence record flagged for human review, never a silent finding.

    Schema-validated. Bounded. Human-flagged.
  4. Phase 04

    Report

    Findings are deduplicated by SHA-256 fingerprint across runs, scored, and assembled into seven report formats. Every finding cites a control code and an evidence trail; code-pattern findings additionally cite a file location, with a line range when the AI can pinpoint it precisely. Sticky resolutions persist so you don't re-resolve the same issue every quarter.

    Fingerprinted. Cited. Sticky-resolved.
Coverage

What we measure. What we don't.

Inside scope

What we measure

  • Architecture quality, separation of concerns, technical debt signals
  • Security baseline, auth, authorization, secrets, dependency provenance
  • Operational maturity, monitoring, alerting, rollback, release governance
  • Documentation completeness, runbooks, API contracts
  • Testing maturity, coverage signals, test categories, CI integration
  • Multi-tenant isolation and tenancy model fitness
  • IP / licensing, dependency audit, SPDX provenance
  • Code quality, linting, structure, complexity heuristics
Outside scope

What we don't

  • Customer references, churn, NPS, or revenue concentration
  • Live penetration testing, exploit chaining, runtime fuzzing
  • Financial DD, books, runway, cap table, options pool
  • Legal review of customer contracts, MSAs, NDAs, employment IP
  • Patent / trademark / IP search beyond connected source
  • Operational runtime, production incidents, observability, on-call
  • Founder background checks or reference calls
  • The judgement of an experienced operator at the table
Reproducibility

What you can re-run. What you can't.

The platform mixes deterministic measurement with AI reasoning. We label each layer so you know what to weight.

Rule layer

Fully deterministic

Same evidence in, same findings out, every run. The 100+ rules are pure functions over collected evidence. Re-run an assessment and diff the results to see exactly what changed in the codebase.

AI agent layer

Advisory, schema-validated

Agent outputs are reasoning, not measurement. Two runs against the same SHA can produce different finding text and small score deltas. We validate every output against a schema before persisting, and flag low-confidence runs for human review. Treat agent findings as expert hints, not verdicts.

Cross-source dedup

Probabilistic

When a rule and an agent surface what looks like the same issue, we collapse them via concept overlap. This is heuristic, it can over-suppress generic findings or under-suppress different phrasings of the same issue. We err on the side of showing more, not less.

Human loop
“The platform replaces the prep work for the meeting. Not the meeting itself.”
, On where you take the wheel
  1. 01

    Resolve findings deliberately

    Mark each finding with a resolution category, FIXED, ACCEPTED_RISK, FALSE_POSITIVE, NOT_APPLICABLE, or WON'T_FIX. Sticky resolutions persist across re-runs so you don't re-resolve the same issue every quarter. A richer reviewer UI (agree / dispute / not relevant) is on the roadmap.

  2. 02

    Re-run on the same SHA

    The first sanity check on any AI-driven assessment: re-run it. Rules will be identical. Agent findings should overlap heavily. A wide delta is a signal, not a feature.

  3. 03

    Pair with a human DD lead

    Treat the report as the brief for a 60-minute conversation with the target's CTO. Use the findings as your agenda. The platform replaces the prep work, not the meeting.

Let's talk

Still have a methodology question?

We'd rather field your hardest question now than have you discover it three weeks into a deal.