Statistical Significance
Statistical significance indicates that an observed difference between variants is unlikely to be due to random chance, typically when the p-value falls below a chosen threshold such as 0.05.
In depth
Statistical significance is assessed with a p-value, which estimates the probability of seeing a result at least as extreme as yours if there were truly no difference between variants. A common threshold is 0.05, meaning you accept a 5% chance of a false positive; reaching it tells you the result is unlikely to be noise, but it says nothing about how large or business-relevant the effect is.
The classic pitfall is peeking: checking a test repeatedly and stopping the moment it crosses 0.05 inflates false positives dramatically. In a quiz-funnel context, you might compare completion rates between two intros, but declaring a winner before reaching significance can lock in a variant that actually performs worse. Pairing significance with a pre-set sample size and a meaningful effect threshold keeps optimization decisions trustworthy.
Example in practice
Frequently asked questions
What p-value counts as statistically significant?
A p-value below 0.05 is the most common threshold, corresponding to a 5% false-positive risk. Some high-stakes tests use stricter thresholds like 0.01 to further reduce the chance of acting on noise.
Does statistical significance mean the result matters for my business?
No, significance only tells you a difference is probably real, not that it is large or valuable. Always pair it with the effect size and practical impact before making a decision.
Why shouldn't I stop a test as soon as it hits significance?
Repeatedly checking and stopping at the first significant moment, called peeking, dramatically inflates false positives. Predefine your sample size or use sequential testing methods to keep results valid.