Glossary
EvaluationEmerging

Pattern Consistency Score

The metric measuring how closely agent-generated code adheres to Golden Samples and established codebase patterns.

Definition

The Pattern Consistency Score measures how closely agent-generated code adheres to Golden Samples and established codebase patterns. Unlike the Architectural Violation Rate, which checks hard constraint failures, this score evaluates softer dimensions of code quality: naming conventions, file organization, error handling patterns, logging approaches, and structural similarity to reference implementations.

The score is assessed through three complementary methods:

  1. Automated static analysis — linters and custom rules that check naming conventions, import ordering, file structure, and other mechanically verifiable patterns. These provide fast, deterministic scoring on dimensions that can be expressed as rules.
  2. LLM-as-a-Judge evaluation — a secondary LLM compares agent-generated code against Golden Samples using structured rubrics, scoring dimensions such as readability, idiomatic usage, and structural similarity. See LLM-as-a-Judge for details on this evaluation approach.
  3. Human review sampling — periodic manual review of a random sample of agent output, scored against the same rubric used by the LLM judge. This calibrates the automated scoring and catches dimensions that neither static analysis nor LLM evaluation captures reliably.

The score is normalized to a 0-to-1 scale. Target ranges:

  • Above 0.8 — agents are consistently following established patterns. Golden Samples are current and effectively guiding generation.
  • 0.7 to 0.8 — acceptable but with identifiable gaps. Review which specific pattern dimensions score lowest and update the relevant Golden Samples or Context Index entries.
  • Below 0.7 — Golden Samples need updating, or Context Packets are not consistently including them during agent execution. This score level typically indicates that reference material is stale or that the context assembly process is dropping pattern references.

The Pattern Consistency Score is reviewed during the monthly Boundary Audit alongside the Architectural Violation Rate. Together, these two metrics give the Principal Systems Architect a complete picture of structural integrity: hard violations (things that break rules) and soft drift (things that diverge from conventions).

Last updated: 3/11/2026