Ingeniería de Specs
Cómo escribir Live Specs efectivos que produzcan altos Ratios de Spec-to-Code y minimicen la intervención del agente.
Resumen
La Ingeniería de Specs es la disciplina de escribir documentos Live Spec que los agentes puedan ejecutar con una intervención mínima. Una spec bien diseñada produce un alto Spec To Code Ratio: el resultado del agente sigue de cerca la especificación en lugar de improvisar. Una spec mal diseñada produce alucinaciones, omisión de casos de borde y bucles, independientemente de qué tan capaz sea el modelo.
Este patrón proporciona una metodología estructurada para la creación de specs. Lleva a los equipos de "escribir algunos requisitos y esperar que el agente lo resuelva" a un proceso repetible: comenzar desde criterios de aceptación ejecutables, descomponer en comportamientos testeables, adjuntar Golden Samples y restricciones arquitectónicas, e iterar basándose en los resultados medidos. El Context Architect es el responsable de este proceso, pero cada ingeniero que escribe specs se beneficia de la metodología.
La ingeniería de specs es la actividad de mayor apalancamiento en un flujo de trabajo de desarrollo agéntico. Una inversión de 30 minutos en una spec precisa ahorra horas de reintentos del agente, escalamientos de Rescue Mission y corrección humana. Los equipos que miden el Spec To Code Ratio encuentran consistentemente que la calidad de la spec —no la selección del modelo, ni los trucos de prompt engineering— es el factor dominante en la calidad del resultado del agente.
Problema
La mayoría de los equipos escriben specs que son legibles para humanos pero ambiguas para los agentes:
-
Requisitos con exceso de prosa. Las specs describen lo que la funcionalidad debe hacer en forma narrativa. Un lector humano infiere los detalles no declarados a partir de su experiencia. Un agente no posee tal inferencia: o inventa los detalles (alucinación) o hace preguntas aclaratorias (retrasos).
-
Criterios de aceptación no testeables. Criterios como "el componente debe manejar los errores con elegancia" o "la API debe ser de alto rendimiento" no pueden ser validados por un Eval Harness. El agente no tiene forma de verificar su propio resultado frente a estos criterios, y el control automatizado tampoco.
-
Ausencia de condiciones de contorno. Las specs describen el camino feliz (happy path) pero omiten el manejo de errores, estados vacíos, acceso concurrente, casos de borde con caracteres especiales y valores límite. Los agentes ignoran estos casos o inventan un manejo que no coincide con las expectativas del equipo.
-
Conocimiento arquitectónico implícito. La spec dice "crear un nuevo servicio" pero no especifica qué patrón de capa de servicio seguir, en qué directorio colocar los archivos, qué convenciones de nomenclatura usar o qué dependencias están permitidas. El agente toma decisiones que pueden violar reglas arquitectónicas no capturadas en la spec.
-
Sin bucle de retroalimentación. Los equipos no miden la calidad de las specs de forma sistemática. Una spec que causó tres rescue missions se ve idéntica en el backlog a una spec que el agente resolvió perfectamente al primer intento. Sin datos de Spec To Code Ratio y Correction Ratio vinculados a specs individuales, no hay señal para la mejora.
Solución
Aplicar una metodología de ingeniería de specs de seis pasos que produzca especificaciones legibles por máquinas, testeables y autónomas.
Los Seis Pasos
- Comenzar con los criterios de aceptación — Escribir condiciones ejecutables y testeables antes que cualquier otra cosa.
- Escribir el contrato de comportamiento — Definir entradas, salidas, casos de borde y manejo de errores.
- Definir la referencia a la constitución del sistema — Enlazar a estándares de codificación, restricciones arquitectónicas y reglas de seguridad.
- Descomponer en un mapa de tareas accionable — Dividir la spec en subtareas ordenadas que el agente pueda ejecutar secuencialmente.
- Adjuntar Golden Samples y referencias de contexto — Proporcionar ejemplos concretos de la calidad de salida esperada.
- Revisar y validar antes de la ejecución del agente — Tratar las specs como código: revisarlas, versionarlas y testearlas.
Implementación
Ejemplos de Código
// scripts/score-spec.ts
interface SpecQualityScore {
acceptanceCriteriaScore: number; // 0-25
behavioralContractScore: number; // 0-25
contextCompletenessScore: number; // 0-25
taskDecompositionScore: number; // 0-25
total: number; // 0-100
grade: "A" | "B" | "C" | "D" | "F";
suggestions: string[];
}
function scoreSpec(specContent: string): SpecQualityScore {
const suggestions: string[] = [];
// Score acceptance criteria
let acScore = 0;
const acTable = specContent.match(
/\| AC-\d+.*\|/g
);
if (acTable && acTable.length > 0) {
acScore += 10; // Has structured ACs
if (acTable.every((row) => row.includes("-test") || row.includes("-check"))) {
acScore += 10; // All ACs have validation methods
} else {
suggestions.push(
"Some acceptance criteria lack validation methods"
);
}
if (acTable.length >= 5) {
acScore += 5; // Sufficient coverage
} else {
suggestions.push(
"Consider adding more acceptance criteria for edge cases"
);
}
} else {
suggestions.push(
"Add structured acceptance criteria with IDs and validation methods"
);
}
// Score behavioral contract
let bcScore = 0;
if (specContent.includes("**Inputs:**")) bcScore += 5;
else suggestions.push("Add an Inputs table to the behavioral contract");
if (specContent.includes("**Outputs (success):**")) bcScore += 5;
else suggestions.push("Add a success Outputs table");
if (specContent.includes("**Outputs (error):**")) bcScore += 5;
else suggestions.push("Add an error Outputs table");
if (specContent.includes("**Behavior — Normal Flow:**")) bcScore += 5;
else suggestions.push("Add normal flow behavior description");
if (specContent.includes("**Behavior — Error Flow:**")) bcScore += 5;
else suggestions.push("Add error flow behavior description");
// Score context completeness
let ctxScore = 0;
if (specContent.includes("## System Constitution")) ctxScore += 5;
else suggestions.push("Add System Constitution reference");
if (specContent.includes("Golden Sample")) ctxScore += 10;
else suggestions.push("Add golden sample references");
if (specContent.includes("Context Packet")) ctxScore += 5;
else suggestions.push("Add context packet file listing");
if (specContent.includes("token") || specContent.includes("budget"))
ctxScore += 5;
else suggestions.push("Add token budget estimate");
// Score task decomposition
let tdScore = 0;
const taskMatches = specContent.match(/### Task \d+/g);
if (taskMatches && taskMatches.length > 0) {
tdScore += 10;
if (specContent.includes("**Dependencies:**")) tdScore += 5;
if (specContent.includes("**Validates:**")) tdScore += 5;
if (specContent.includes("**Golden Sample:**")) tdScore += 5;
} else {
suggestions.push("Decompose spec into ordered subtasks with dependencies");
}
const total = acScore + bcScore + ctxScore + tdScore;
const grade =
total >= 85
? "A"
: total >= 70
? "B"
: total >= 55
? "C"
: total >= 40
? "D"
: "F";
return {
acceptanceCriteriaScore: acScore,
behavioralContractScore: bcScore,
contextCompletenessScore: ctxScore,
taskDecompositionScore: tdScore,
total,
grade,
suggestions,
};
}# BEFORE: Vague spec that causes agent failures
# ------------------------------------------------
feature: User Authentication
description: Add login functionality
requirements:
- Users can log in with email and password
- Show error on invalid credentials
- Use JWT for sessions
notes: Follow existing patterns
# AFTER: Engineered spec that agents execute reliably
# ------------------------------------------------
feature: User Authentication — Login Endpoint
spec_version: 1
status: ready
author: "@context-architect"
behavioral_contract:
purpose: "Authenticate user by email/password, return signed JWT"
inputs:
- name: email
type: string
constraints: "RFC 5322 email format"
required: true
- name: password
type: string
constraints: "8-128 characters"
required: true
outputs_success:
- name: token
type: string
description: "RS256-signed JWT, 24h expiry, contains userId/email/role"
- name: user
type: object
description: "{ id: string, email: string, role: string }"
outputs_error:
- status: 401
code: AUTH_INVALID_CREDENTIALS
when: "Email exists but password does not match"
- status: 401
code: AUTH_USER_NOT_FOUND
when: "No user with provided email"
- status: 429
code: AUTH_RATE_LIMITED
when: "5+ failures from same IP in 15 minutes"
- status: 400
code: AUTH_INVALID_INPUT
when: "Email or password fails format validation"
acceptance_criteria:
- id: AC-1
criterion: "Returns 200 with JWT on valid credentials"
validation: integration-test
priority: must-have
- id: AC-2
criterion: "Returns 401 AUTH_INVALID_CREDENTIALS on wrong password"
validation: integration-test
priority: must-have
- id: AC-3
criterion: "Returns 429 after 5 failed attempts from same IP in 15min"
validation: integration-test
priority: must-have
- id: AC-4
criterion: "JWT uses RS256, contains userId/email/role, 24h expiry"
validation: unit-test
priority: must-have
- id: AC-5
criterion: "bcrypt timing-safe comparison for passwords"
validation: unit-test
priority: must-have
system_constitution_ref: "context/constitution/coding-standards.md"
golden_samples:
- path: "src/users/services/user-service.ts"
demonstrates: "Service pattern, error handling"
- path: "src/users/__tests__/user-service.test.ts"
demonstrates: "Test structure, mocking"
context_packet:
- "prisma/schema.prisma"
- "src/shared/errors/app-error.ts"
- "docs/api/auth-spec.yaml"
task_map:
- id: 1
name: "Input validation schema"
deliverable: "src/auth/schemas/login-schema.ts"
validates: [AC-4]
dependencies: []
- id: 2
name: "Auth service login method"
deliverable: "src/auth/services/auth-service.ts"
validates: [AC-1, AC-2, AC-3, AC-4, AC-5]
dependencies: [1]
- id: 3
name: "Auth controller login endpoint"
deliverable: "src/auth/controllers/auth-controller.ts"
validates: [AC-1, AC-2, AC-3]
dependencies: [2]
- id: 4
name: "Rate limiting middleware"
deliverable: "src/auth/middleware/rate-limiter.ts"
validates: [AC-3]
dependencies: []
- id: 5
name: "Tests for all acceptance criteria"
deliverable: "src/auth/__tests__/"
validates: [AC-1, AC-2, AC-3, AC-4, AC-5]
dependencies: [1, 2, 3, 4]
token_budget: 12000Consideraciones
- • **Higher Spec-to-Code Ratio.** Engineered specs leave less room for agent improvisation. When the spec defines inputs, outputs, error cases, and task decomposition, the agent follows the specification rather than guessing. Teams that adopt this methodology typically see [[spec-to-code-ratio]] increase from 50-60% to 80-90%.
- • **Fewer rescue missions.** The most common [[rescue-mission]] root cause is "ambiguous spec" (~40% of rescues). Engineered specs with testable acceptance criteria and explicit boundary conditions eliminate most of these. The [[correction-ratio]] drops because the agent gets it right on the first attempt.
- • **Specs become institutional knowledge.** A well-engineered spec is not a throwaway document — it is a reusable reference. When a similar feature needs to be built in the future, the team can adapt an existing spec rather than starting from scratch. The spec library grows into a valuable asset.
- • **Measurable quality improvement.** By tying [[spec-to-code-ratio]] and [[correction-ratio]] back to individual specs, teams can identify which spec patterns work and which do not. This creates a data-driven feedback loop for continuous improvement.
- • **Reduced onboarding time for new engineers.** A library of well-engineered specs teaches new team members how the codebase works, what patterns to follow, and what quality looks like. The specs are living documentation of the team's engineering standards.
- • **Initial slowdown perception.** Engineers accustomed to jumping straight into code perceive spec writing as overhead. The response is data: show that a 30-minute spec investment saves 2-3 hours of agent retries and human correction. Track the numbers to make the case.
- • **Spec maintenance as code evolves.** Specs that reference specific file paths, database schemas, or API contracts become stale as the codebase changes. Include spec staleness checks in the monthly Context Hygiene ceremony. Flag specs where referenced files have changed since the spec was last updated.
- • **Translation gap between business and technical language.** Product managers write requirements in business language. [[context-architect]] roles translate these into technical specs. The translation step introduces potential information loss. Pair the [[context-architect]] with the product manager during spec authoring to minimize gaps.
- • **Over-engineering specs for simple tasks.** Not every task needs a full six-step spec. A simple CRUD endpoint following an existing pattern may need only acceptance criteria and a golden sample reference. Scale the spec effort to the task complexity — use the routing matrix from [[agent-task-routing]] to determine how thorough the spec needs to be.
- • **Resistance to peer review of specs.** Engineers may view spec review as bureaucratic overhead. Frame it as the highest-leverage review activity: 15 minutes of spec review prevents hours of wasted agent execution. Start with lightweight reviews (a quick read-through) and formalize only for high-complexity tasks.