Translate A/B test results into a plain-English decision memo
Stat-sig outputs from an A/B test rarely translate directly into a shipping decision. This prompt converts raw test results into a clear recommendation that includes confidence level, practical significance, and the right caveats.
You are an experimentation analyst. I need help turning raw A/B test results into a decision memo for a non-technical stakeholder.
Test details:
- What was tested: {{TEST_DESCRIPTION}}
- Primary metric and goal: {{PRIMARY_METRIC}}
- Secondary metrics tracked: {{SECONDARY_METRICS}}
- Results (paste numbers, including sample sizes, conversion rates, p-value or confidence interval): {{TEST_RESULTS}}
- Test duration and traffic split: {{TEST_DURATION_AND_SPLIT}}
Follow these steps:
1. State in one sentence whether the test reached statistical significance and at what confidence level.
2. Calculate or confirm the relative and absolute lift on the primary metric. Flag if the lift is statistically significant but practically small (e.g., <0.5% absolute change).
3. Check the secondary metrics: did any move in a direction that would offset the primary metric gain? Flag any guardrail metric violations.
4. Identify at least two reasons to be cautious about generalizing this result (e.g., novelty effect, limited segment coverage, short test window).
5. Write a recommendation paragraph: ship, iterate, or do not ship — and state the single most important reason.
6. List two follow-up questions the stakeholder is likely to ask and pre-answer them.
This prompt assumes you have already run the statistical test elsewhere. It does not perform significance calculations from scratch. {{TEST_DESCRIPTION}}{{PRIMARY_METRIC}}{{SECONDARY_METRICS}}{{TEST_RESULTS}}{{TEST_DURATION_AND_SPLIT}}
How to use this prompt
- Copy the prompt above (Copy button on the top-right).
- Replace each
{{VAR}}with your own value. Variables:{{TEST_DESCRIPTION}}{{PRIMARY_METRIC}}{{SECONDARY_METRICS}}{{TEST_RESULTS}}{{TEST_DURATION_AND_SPLIT}}. - Paste it into one of the recommended tools below.
- Iterate: tighten constraints in the prompt if the output is generic.
Why this prompt is structured this way
The prompt is split into explicit steps because LLMs do better when the path is named, not implied. Each variable forces specificity at the input layer — vague inputs get vague outputs.
Pair this prompt with a tool
Notion AI
$8/user/mo add-onAI baked into the docs/wiki/projects tool you already use.
Notion AI is unremarkable as a standalone writer but indispensable if Notion is your team's source of truth — it works on the docs and databases you already have.
Claude (Anthropic)
$0/mo (Pro at $20)Frontier model with long context and strong reasoning.
Claude (Opus / Sonnet / Haiku tiers) is the assistant favored by writers and engineers who care about reasoning quality and tone. 1M token context on Opus.
ChatGPT (OpenAI)
$0/mo (Plus at $20)The category-defining general-purpose AI assistant.
ChatGPT has the broadest feature surface: image gen, voice, custom GPTs, web browsing, code execution. Often the right default; sometimes beaten on specific tasks by Claude or Perplexity.
Perplexity
$0/mo (Pro at $20)AI search engine with citations.
Perplexity is the answer engine Google would build if it weren't protecting search ad revenue. Cited answers, follow-up questions, focused source modes.
The PlaybookPrompts weekly
One short email per week. The five sharpest prompts we found, one tool worth your attention, one anti-pattern to avoid. Unsubscribe in one click.
Newsletter signup is not configured. Set PUBLIC_NEWSLETTER_USERNAME in the build env.