The CFO's question made my stomach drop.
"If we turned off all marketing tomorrow, how many of these users would still come to us organically?"
I had a beautiful attribution dashboard. Last-click, multi-touch, view-throughโI could show her any model she wanted. What I couldn't tell her was whether any of our marketing was actually working, or if we were just spending money to claim credit for conversions that would have happened anyway.
"I don't know," I admitted. "The attribution tells me who touched what. It doesn't tell me what actually mattered."
That conversation led me down a rabbit hole that fundamentally changed how I think about marketing measurement. It turns out, the difference between what we attribute and what we actually cause is enormousโand most companies never bother to find out.
The Uncomfortable Truth About Attribution
Incrementality measures the causal impact of your marketingโthe conversions that would NOT have happened without your advertising. It separates true lift from conversions that would have occurred organically whether you advertised or not.
Why Attribution Was Lying to Us
After we started running incrementality tests, we discovered our attribution was systematically misleading us:
- Correlation โ causation: Someone saw our ad and then converted. But did the ad cause the conversion, or did we just target people who were already going to buy?
- Selection bias: We targeted "high-intent" users. Of course they convertedโthey were going to anyway
- Attribution wars: Three channels all claiming credit for the same conversion. Math doesn't add up
- Privacy gaps: Increasingly can't track user journeys, so attribution is guessing more than ever
๐ The Number That Shocked Us
Studies show that 20-60% of attributed conversions would have happened anyway without the ad exposure. When we ran our first incrementality test on our "best performing" retargeting campaign, we found our true incremental rate was 23%. We were paying for 77% of conversions we would have gotten for free.
The Testing Methods That Work
Randomized Controlled Tests: The Gold Standard
The most rigorous method: randomly assign users to test groups (see ads) or control groups (don't see ads), then measure the difference.
How We Run Them
- Define your test hypothesis and success metrics before you start
- Calculate required sample size for statistical significance (usually thousands)
- Randomly split audience into test and controlโtrue randomization is critical
- Run ads to test group only, while control sees nothing
- Measure conversion difference between groups
- Calculate lift and confirm statistical significance
Ghost Bids: The Clever Workaround
For auction-based environments where you can't easily withhold ads, you can participate in auctions but not actually show ads to the control group:
- Bid on all eligible impressions normally
- Win auctions for both test and control groups
- Serve actual ads only to the test group
- Track conversions for both groups
This eliminates selection bias because both groups were "eligible" for the adโonly one actually saw it.
Geo Experiments: When User-Level Doesn't Work
Sometimes you can't randomize at the user level. Geographic experiments use regions as test and control groups:
The Key Requirements
- Match markets by size, demographics, and baseline behavior. Dallas vs. Houston, not New York vs. rural Montana
- Account for seasonal and regional variations
- Run for sufficient durationโ2-4 weeks minimum
- Use synthetic control methods for analysis
"The best incrementality test is one you can actually act on. A statistically perfect result that's too expensive to implement helps nobody. Design for actionable insights, not academic purity."
How to Design Tests That Actually Work
Start With a Real Hypothesis
Not "does marketing work?" but something specific and testable:
- "Facebook retargeting drives X% incremental purchases among cart abandoners"
- "Upper-funnel video increases conversion rates Y% among users who later see our search ads"
- "Email reminders drive Z% lift compared to no reminder"
The Sample Size Problem
Most tests fail because they don't run long enough to achieve statistical significance:
- Low baseline conversion rates need larger samples (if only 1% convert, you need tens of thousands of users)
- Smaller expected lifts need larger samples (detecting a 5% lift is harder than detecting a 50% lift)
- Confidence level: Usually target 95%
- Statistical power: Usually target 80%
Duration Matters
- Minimum 1-2 weeks to capture full conversion cycles
- Account for day-of-week effects (weekends convert differently)
- Consider your typical time-to-convert
- Avoid major seasonality or events that could contaminate results
Reading Results Without Lying to Yourself
The Metrics That Matter
- Incremental lift: (Test - Control) / Control. This is your true impact percentage
- Incremental conversions: Total attributed ร incremental rate. This is what you actually drove
- True CPA: Spend / Incremental conversions. Usually 2-5x higher than attributed CPA
- iROAS: Incremental revenue / Spend. The real return on your investment
The Significance Trap
- Calculate p-value for observed liftโyou need p < 0.05 (95% confidence) to trust results
- Report confidence intervals, not just point estimates. "10% lift" is meaningless without context
- Be deeply skeptical of small sample sizes or short tests. They produce noise, not signal
The Pitfalls That Ruined Our First Tests
Contamination
When your control group gets exposed to treatment anyway, your test is worthless:
- Cross-device exposure: User saw ad on phone, converted on laptop. Control is contaminated
- Geo spillover: People in "control" region saw your TV ads airing in "test" region
- Shared household: One person in control, another in test, sharing purchasing decisions
Selection Bias
When your test and control groups aren't truly comparable from the start:
- Non-random assignment (even subtle patterns matter)
- Pre-existing differences between groups
- Opt-in bias where engaged users select into treatment
Implement Incrementality Testing
ClicksFlyer's measurement team can help you design and execute incrementality tests that reveal the true impact of your campaigns.
Get StartedWhat We Did When the Results Hurt
Our first real incrementality test showed that our "best" channel had a 23% incremental rate. The attribution dashboard said ROAS was 400%. True incremental ROAS was 92%. We were losing money on every dollar spent.
When Lift Is Lower Than Expected
- Reduce budget on that channel/campaignโdon't throw good money after bad
- Reallocate to higher-incrementality tactics (often upper-funnel)
- Test different targeting approachesโmaybe you're just targeting the wrong people
- Consider if you're over-investing in bottom-funnel (often the case)
Building an Incrementality Culture
One test isn't enough. You need ongoing measurement:
- Run regular tests across channelsโquarterly at minimum
- Build incrementality into planning models, not just attribution
- Train teams to ask "what's the incremental impact?" not just "what's the attributed return?"
- Invest in measurement infrastructure that makes testing easy
Incrementality testing requires investment. It requires patience. It often delivers uncomfortable truths. But it's the only way to know whether your marketing is actually workingโor whether you're just paying to take credit for conversions that were going to happen anyway.
The CFO never asked me that question again. Now I have real answers.