Completeness
Prove the expected records actually landed.
- row_count
Total rows, source vs target
- count
Count of a chosen field
- distinct_count
Distinct values of a field
For Azure Data Factory teams
Catch the missing rows and broken numbers ADF can't see — without a single row leaving your Azure. General Validation recomputes source against target inside your own environment and hands back run evidence you can put in front of anyone.
Onboarding a handful of design-partner teams this quarter. Your data never leaves your Azure tenant.
| Status | Function | Field A | Op | Field B |
|---|---|---|---|---|
| PASS | OUTER_VALUE | order_id | = | order_id |
| PASS | COUNT_ROWS | — | > | — |
| FAIL | VALUE | amount | = | amount |
metadata only · runs in your Azure · evidence in your storage
Runs inside your own Azure
Metadata & results only — never your rows
One reusable ADF pipeline
The problem
As long as the final metric looks positive, it ships — a thumbs-up, no further review.
Rows drop, types coerce silently, late files land half-written, and the run still reports green. The one engineer who asks “is this data actually right?” ends up with a target on their back — the question hands responsibility back to the business, so it's easier if nobody asks. The number that's off by two percent rides the dashboard for a week. And when someone finally recreates the total and lands on a different answer, dropping the conversation is easier than having it.
General Validation recreates the number for you — source against target, every run — and hands back evidence no one can wave away. The question stops being yours to carry. The data answers it.
The same run, two truths
ADF reported the first line. Only a data check reports the second.
How it works
Three steps, no agent in your data path. The product reads metadata and orchestrates; your Azure does the reading and writing.
Connect your Data Factory. We read metadata only — datasets, linked services, pipelines, and schemas — never your row data.
Pair a source and target, then declare the checks that matter: counts, sums, distinct values, value matches, set membership.
Checks compile into one reusable ADF Mapping Data Flow and run inside your Azure. Only results and diagnostics come back.
Why teams trust the result
Every run moves through four stages, so an infrastructure failure is never confused with a data failure. When something's wrong, you know exactly which — and where to look.
Resources and inputs are checked before anything runs.
Your Azure reads the source and target and runs the checks.
Results land in your own Delta storage and are read back.
Pass or fail on the data itself — the answer you came for.
Scope — today
Flat tabular data across the formats Azure Data Factory reads natively. This is what runs end-to-end right now, not a roadmap.
Runnable formats
Pair any two runnable datasets as source and target. ADF reads both sides in your Azure environment; General Validation stores result metadata, not source or target rows.
Parquet
Columnar files in ADF-managed storage paths.
CSV / delimited text
Flat delimited datasets with schema discovery.
Delta Lake
Lakehouse tables read and written inside Azure.
Azure SQL
Azure SQL Database, MI, and Synapse SQL.
Twelve validation checks
Grouped by the question they answer: are the records present, do the measures agree, do joined values match, and do sets reconcile?
tolerances · casts · evidence
Completeness
Prove the expected records actually landed.
Total rows, source vs target
Count of a chosen field
Distinct values of a field
Aggregates
Catch numeric, date, and timestamp drift.
Sum of a numeric field
Average of a numeric field
Minimum value
Maximum value
Row Values
Compare records across joined pairs.
Row-level value match across a join
Value match, counting unmatched rows
Sets
Check membership and equality of column sets.
Set A contained in B
Set B contained in A
Set equality
Numeric aggregates support tolerances.
Date, timestamp, and string checks are exact-match.
Casts are opt-in and validated before a run.
The evidence
Not another green checkmark — a small, repeatable record of what was compared, which rule failed, how far it drifted, and where to inspect the evidence.
When someone says the number's fine and you're not sure, you don't argue — you show the run.
| VALIDATION RUN · run_2f9c… pair: orders → orders_dw | |||
|---|---|---|---|
| Test | Status | ||
| row_count | FAIL Δ −2 | ||
| sum(amount) | PASS | ||
| distinct(id) | PASS | ||
| max(updated) | PASS | ||
Security & privacy
We store validation metadata and results — schemas, run status, metrics, diagnostics, and pointers to evidence — not your source or target rows. The data stays inside your Azure boundary, where it already lives.
Customer-owned Azure
The application, worker, runtime, storage, secrets, and evidence all sit inside infrastructure your team controls.
No vendor data lake
Source rows, target rows, validation outputs, and bad-record evidence stay in your Azure. We never copy them out.
Enterprise controls
OIDC / SSO sign-in, role-based access, tenant isolation, and audit logging — supported and tested, not a compliance badge.
Private beta
We're onboarding a limited number of design-partner teams this quarter — chosen for real Azure Data Factory workloads, not logos. Early partners get priority support and a direct line into the roadmap.
Get started
Request access, or book a call and we'll walk your Azure Data Factory workload together — what to validate first, how it runs in your environment, and what your team gets back.
Private beta · metadata only · runs in your Azure.