ab-test-setup
Tests & QualitéStructured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Documentation
A/B Test Setup
1️⃣ Purpose & Scope
Ensure every A/B test is valid, rigorous, and safe before a single line of code is written.
---
2️⃣ Pre-Requisites
You must have:
Hypothesis Quality Checklist
A valid hypothesis includes:
---
3️⃣ Hypothesis Lock (Hard Gate)
Before designing variants or metrics, you MUST:
Ask explicitly:
> “Is this the final hypothesis we are committing to for this test?”
Do NOT proceed until confirmed.
---
4️⃣ Assumptions & Validity Check (Mandatory)
Explicitly list assumptions about:
If assumptions are weak or violated:
---
5️⃣ Test Type Selection
Choose the simplest valid test:
Default to A/B unless there is a clear reason otherwise.
---
6️⃣ Metrics Definition
#### Primary Metric (Mandatory)
#### Secondary Metrics
#### Guardrail Metrics
---
7️⃣ Sample Size & Duration
Define upfront:
Estimate:
Do NOT proceed without a realistic sample size estimate.
---
8️⃣ Execution Readiness Gate (Hard Stop)
You may proceed to implementation only if all are true:
If any item is missing, stop and resolve it.
---
Running the Test
During the Test
DO:
DO NOT:
---
Analyzing Results
Analysis Discipline
When interpreting results:
Interpretation Outcomes
| Result | Action |
| -------------------- | -------------------------------------- |
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
---
Documentation & Learning
Test Record (Mandatory)
Document:
Store records in a shared, searchable location to avoid repeated failures.
---
Refusal Conditions (Safety)
Refuse to proceed if:
Explain why and recommend next steps.
---
Key Principles (Non-Negotiable)
---
Final Reminder
A/B testing is not about proving ideas right.
It is about learning the truth with confidence.
If you feel tempted to rush, simplify, or “just try it” —
that is the signal to slow down and re-check the design.
Compétences similaires
Explorez d'autres agents de la catégorie Tests & Qualité
performance-testing-review-multi-agent-review
"Use when working with performance testing review multi agent review"
testing-patterns
Jest testing patterns, factory functions, mocking strategies, and TDD workflow. Use when writing unit tests, creating test factories, or following TDD red-green-refactor cycle.
playwright-skill
Complete browser automation with Playwright. Auto-detects dev servers, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check responsive design, validate UX, test login flows, check links, automate any browser task. Use when user wants to test websites, automate browser interactions, validate web functionality, or perform any browser-based testing.