Backfilling incremental models in Dataform with a compilation variable

2026-05 · Andre Arias

Historical backfills in Dataform used to mean one of two bad options: a full table refresh (slow and expensive), or hardcoding a date directly into the SQL, running it, and hoping you remember to revert it before the next scheduled run.

There is a better way: a compilation variable you pass at run time to reprocess only what you need, with no changes to your SQL.

VS Code showing the BACKFILL_DATE pre-operations SQL on the left, workflow_settings.yaml in the center, and the Dataform Tools panel with the variable set, with a successful incremental run in the terminal below.

-- the setup

My pipeline is built on top of GA4Dataform by Superform Labs, which uses a date_checkpoint pattern to handle GA4's 72-hour data latency window. I extended it by adding a BACKFILL_DATE compilation variable to my JS helpers.

In workflow_settings.yaml, the variable defaults to an empty string, so scheduled runs behave exactly as before. When I need to reprocess historical data, I pass a date via CLI:

dataform run --vars=BACKFILL_DATE=2026-05-18 --actions=my_model

-- the logic

  • BACKFILL_DATE provided: use it as the start date
  • Not provided: fall back to GA4Dataform's incremental checkpoint
  • Empty table: fall back to the project START_DATE

Surgical reprocessing. No code changes. No hardcoded dates to revert.

-- bonus: run it from VS Code

The Dataform Tools VS Code extension by Ashish Alex lets you set the compilation variable and run the compiled query directly from the editor. One click to preview exactly what will execute before you commit to a full run.

This pattern is part of the larger Unified GA4 analytics pipeline I own at Direct Wines, where reliable incremental processing across 8 brands makes full refreshes something I almost never need.

← back to notes