ab-test-setup
Tests & QualitéStructured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness.
Documentation
A/B Test Setup
1️⃣ Purpose & Scope
Ensure every A/B test is valid, rigorous, and safe before a single line of code is written.
---
2️⃣ Pre-Requisites
You must have:
Hypothesis Quality Checklist
A valid hypothesis includes:
---
3️⃣ Hypothesis Lock (Hard Gate)
Before designing variants or metrics, you MUST:
Ask explicitly:
> “Is this the final hypothesis we are committing to for this test?”
Do NOT proceed until confirmed.
---
4️⃣ Assumptions & Validity Check (Mandatory)
Explicitly list assumptions about:
If assumptions are weak or violated:
---
5️⃣ Test Type Selection
Choose the simplest valid test:
Default to A/B unless there is a clear reason otherwise.
---
6️⃣ Metrics Definition
#### Primary Metric (Mandatory)
#### Secondary Metrics
#### Guardrail Metrics
---
7️⃣ Sample Size & Duration
Define upfront:
Estimate:
Do NOT proceed without a realistic sample size estimate.
---
8️⃣ Execution Readiness Gate (Hard Stop)
You may proceed to implementation only if all are true:
If any item is missing, stop and resolve it.
---
Running the Test
During the Test
DO:
DO NOT:
---
Analyzing Results
Analysis Discipline
When interpreting results:
Interpretation Outcomes
| Result | Action |
| -------------------- | -------------------------------------- |
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
---
Documentation & Learning
Test Record (Mandatory)
Document:
Store records in a shared, searchable location to avoid repeated failures.
---
Refusal Conditions (Safety)
Refuse to proceed if:
Explain why and recommend next steps.
---
Key Principles (Non-Negotiable)
---
Final Reminder
A/B testing is not about proving ideas right.
It is about learning the truth with confidence.
If you feel tempted to rush, simplify, or “just try it” —
that is the signal to slow down and re-check the design.
Compétences similaires
Explorez d'autres agents de la catégorie Tests & Qualité
data-engineering-data-driven-feature
"Build features guided by data insights, A/B testing, and continuous measurement using specialized agents for analysis, implementation, and experimentation."
web3-testing
Test smart contracts comprehensively using Hardhat and Foundry with unit tests, integration tests, and mainnet forking. Use when testing Solidity contracts, setting up blockchain test suites, or validating DeFi protocols.
tdd-workflows-tdd-red
Generate failing tests for the TDD red phase to define expected behavior and edge cases.