Methodology · a primer

How an assessment is actually produced.

Investors are right to be sceptical of AI-generated readiness scores. This page lays out exactly how a run is constructed, what we measure with confidence, what we measure as advisory, and what we don't touch at all. Read it before you read the report.

Pipeline

Four phases. Auditable end-to-end.

Phase 01
Collect
A read-only GitHub connector (Personal Access Token, scoped to read) walks the repo tree at any depth, pulls relevant files, CI configs, branch-protection rules, dependency manifests, and metadata. Connector credentials are AES-256-GCM encrypted at rest, zeroed on revoke, and never returned in API responses.
Read-only. Revocable. Never cloned.
Phase 02
Evaluate
100+ deterministic rules across four frameworks evaluate the collected evidence. Pass / fail / unknown. Never a hallucination. Same evidence in, same findings out, every run.
Reproducible. Auditable. Pure functions.
Phase 03
Reason
30 AI specialist agents read the evidence and produce qualitative findings: architecture critique, IP provenance, ops maturity, support readiness. Each output is schema-validated before persistence; malformed responses become a low-confidence record flagged for human review, never a silent finding.
Schema-validated. Bounded. Human-flagged.
Phase 04
Report
Findings are deduplicated by SHA-256 fingerprint across runs, scored, and assembled into seven report formats. Every finding cites a control code and an evidence trail; code-pattern findings additionally cite a file location, with a line range when the AI can pinpoint it precisely. Sticky resolutions persist so you don't re-resolve the same issue every quarter.
Fingerprinted. Cited. Sticky-resolved.

Coverage

What we measure. What we don't.

Inside scope

What we measure

Architecture quality, separation of concerns, technical debt signals
Security baseline, auth, authorization, secrets, dependency provenance
Operational maturity, monitoring, alerting, rollback, release governance
Documentation completeness, runbooks, API contracts
Testing maturity, coverage signals, test categories, CI integration
Multi-tenant isolation and tenancy model fitness
IP / licensing, dependency audit, SPDX provenance
Code quality, linting, structure, complexity heuristics

Outside scope

What we don't

Customer references, churn, NPS, or revenue concentration
Live penetration testing, exploit chaining, runtime fuzzing
Financial DD, books, runway, cap table, options pool
Legal review of customer contracts, MSAs, NDAs, employment IP
Patent / trademark / IP search beyond connected source
Operational runtime, production incidents, observability, on-call
Founder background checks or reference calls
The judgement of an experienced operator at the table

Reproducibility

What you can re-run. What you can't.

The platform mixes deterministic measurement with AI reasoning. We label each layer so you know what to weight.

Rule layer

Fully deterministic

Same evidence in, same findings out, every run. The 100+ rules are pure functions over collected evidence. Re-run an assessment and diff the results to see exactly what changed in the codebase.

AI agent layer

Advisory, schema-validated

Agent outputs are reasoning, not measurement. Two runs against the same SHA can produce different finding text and small score deltas. We validate every output against a schema before persisting, and flag low-confidence runs for human review. Treat agent findings as expert hints, not verdicts.

Cross-source dedup

Probabilistic

When a rule and an agent surface what looks like the same issue, we collapse them via concept overlap. This is heuristic, it can over-suppress generic findings or under-suppress different phrasings of the same issue. We err on the side of showing more, not less.

Human loop

“The platform replaces the prep work for the meeting. Not the meeting itself.”

, On where you take the wheel

01
Resolve findings deliberately
Mark each finding with a resolution category, FIXED, ACCEPTED_RISK, FALSE_POSITIVE, NOT_APPLICABLE, or WON'T_FIX. Sticky resolutions persist across re-runs so you don't re-resolve the same issue every quarter. A richer reviewer UI (agree / dispute / not relevant) is on the roadmap.
02
Re-run on the same SHA
The first sanity check on any AI-driven assessment: re-run it. Rules will be identical. Agent findings should overlap heavily. A wide delta is a signal, not a feature.
03
Pair with a human DD lead
Treat the report as the brief for a 60-minute conversation with the target's CTO. Use the findings as your agenda. The platform replaces the prep work, not the meeting.

Let's talk

Still have a methodology question?

We'd rather field your hardest question now than have you discover it three weeks into a deal.

Ask the team See pricing

How an assessment is actually produced.

Four phases. Auditable end-to-end.

Collect

Evaluate

Reason

Report

What we measure. What we don't.

What you can re-run. What you can't.

Rule layer

AI agent layer

Cross-source dedup

Resolve findings deliberately

Re-run on the same SHA

Pair with a human DD lead

Still have a methodology question?