"Experiments: before and after"

How to freeze a baseline for one query, record one action, and let the verdict compute itself from later scans.

An experiment is the platform's proof layer. You pick one prompt, freeze its current scores as a baseline, take one action (typically publishing the content draft aimed at that query), and let the verdict form automatically from later scans of the same product. The goal is a clean cause-and-effect pair: one lever, one measured outcome. See Content Studio for generating the draft that you link to an experiment.

Why one lever matters

If you change three things at once and the number moves, you cannot know which change caused it. The experiment form enforces this by design: it asks for one action note per experiment, not a list. The note is freeform text describing the single thing you did, such as "published comparison guide" or "added schema markup."

Starting an experiment

The product needs at least one completed scan before you can start an experiment. Pick a prompt from the product's prompt library, optionally link a content draft from Content Studio, add a note describing the one action you are taking, and click "Freeze baseline."

The platform records the current recommendation count, mention count, and total answer count for that prompt from the most recent live scan. Those numbers become the fixed baseline.

Awaiting re-scan

After the baseline is frozen, the experiment shows an "awaiting re-scan" state in the latest column. No verdict is possible until a subsequent scan runs. The experiment page notes how many days have passed since the experiment started.

Auto-scan, if active for the product, will eventually produce a later scan and the verdict will compute itself. You can also trigger a manual scan from the product page.

How the verdict is calculated

Once a later scan exists, the platform compares the latest scores for that same prompt against the baseline. The verdict uses the change in recommendation and mention rates:

If recommendation or mention rates improved, the verdict shows "up" with the direction.
If rates fell, the verdict shows "DOWN."
If rates are unchanged, the verdict shows "falsified" - the lever was applied, but the number did not move. This is a real finding, not a failure of the platform.

The verdict is always honest. An unchanged result is never suppressed or soft-labelled.

The falsified verdict

A falsified experiment means the action did not measurably change AI behaviour for that query. This is real information: the lever either does not work for this query, did not have time to propagate, or was outweighed by other factors. Try a different lever for the next experiment on that prompt.

Questions

Can I run more than one experiment on the same prompt at the same time?

The form does not prevent it, but running two experiments on the same prompt while also taking different actions makes the verdict uninterpretable. The one-lever rule should be applied here: use the same prompt for a second experiment only after the first one has produced a verdict.

What if no re-scan has run yet?

The experiment sits in "awaiting re-scan" state and shows no verdict. It will update automatically after the next scan of that product completes. The day count shown in the experiment card tells you how long it has been since you froze the baseline.

Does the experiment track all surfaces or just one?

The baseline and latest values both aggregate across all surfaces that produced answers for that prompt in the respective scans. The recommendation and mention counts shown are totals across engines, not per-engine.

Can I delete an experiment?

The UI does not show a delete action for experiments. Once a baseline is frozen it stays in the list.

What does linking a content draft do?

Linking a draft connects the experiment card to the Content Studio entry for that draft. The card shows a link to the draft so you can navigate back to it. The connection is informational; it does not change how the verdict is calculated.

Does a demo scan count as a re-scan for verdict purposes?

No. Demo scans are excluded from measurements. Only live scans contribute to verdict calculations. See Scans: demo mode vs live mode for detail on how modes stay separated.

What does the day count on an experiment card mean?

It shows how many days have passed since you started the experiment (froze the baseline). It is a reference point to judge whether enough time has elapsed for your action to have propagated to the AI engines being measured.