Cross-Domain Identity Matching: Not Sharing Data, but Building Verifiable Correspondence

The essence of cross-domain matching isn't data synchronization.

It's establishing a conditional, explainable, revocable correspondence between two boundaries. Ad Cookie Sync, OAuth login, payment callbacks, conversion attribution — these seemingly unrelated scenarios all handle the same fundamental question: confirming that "this object here" and "that object there" are the same thing.

This post unpacks four scenarios, abstracts a general model, and identifies when you shouldn't be matching at all.

What Cross-Domain Matching Actually Matches

First, what "domain" means here.

It's not just a browser domain. It can be:

A business system
An organization
A data source
An account system
A device space
A third-party platform
An ad exchange
A payment channel
An analytics system

Each domain has its own ID system.

For example:

User in System A: user_id = A123
User in System B: user_id = B789

These two IDs have no natural relationship. A123 doesn't equal B789 just because both look unassuming. One of the most dangerous engineering mistakes is seeing two fields both named user_id and pretending they're semantically identical.

What cross-domain matching builds is a mapping:

A123 <-> B789

And usually this isn't enough. You also need:

A123 <-> B789
source = cookie_sync
partner = some_exchange
created_at = 2026-05-21
expire_at = 2026-07-20
consent_scope = ...
confidence = ...

Matching isn't just "storing two IDs." It carries at least four layers of meaning:

Who matches with whom
Why they're believed to match
Where this match came from
When this match should expire

Without the last three, the ID mapping eventually becomes a data garbage dump. The problem with a garbage dump isn't that it's unusable — it's that you never know whether you're stepping on data or a landmine.

The most intuitive thought: since both systems need to identify the user, why doesn't everyone use the same ID?

This sounds beautiful, like many architecture diagrams.

In reality, it's very hard.

First, system boundaries differ. An ad exchange can't use a DSP's internal user ID as its primary key. A payment platform can't adopt a merchant system's user ID as its own.

Second, permission boundaries differ. Being able to identify a user doesn't mean you can unconditionally transfer that identification capability to someone else. Especially in advertising, finance, healthcare, and content recommendation — the ID itself is a sensitive asset.

Third, lifecycle differs. An ID in System A might be long-stable; an ID in System B might rotate, expire, get cleaned up, or reset. Forcibly unifying IDs couples the two systems' lifecycles together.

Fourth, compliance boundaries differ. Some IDs can be used in one context but not another. Some data can be used for statistics but not personalization. A user consenting to A doesn't mean consenting to B.

So the more practical approach isn't sharing a global ID but establishing a controlled mapping.

local_id + partner + scope -> external_id

More cumbersome than "everyone shares one ID," but it preserves boundaries.

Unified ID isn't always wrong. Internal microservices within the same organization, SSO within the same trust domain, user profiles within the same data platform — unified IDs are reasonable and simpler here. The problem isn't unified ID itself; it's thoughtlessly extending unified ID practices to cross-trust-domain, cross-organization, cross-compliance scenarios. The test isn't "can we unify" but "is this one trust domain."

These architectural choices seem self-inflicted, but their purpose is preserving boundaries. Boundaries are inconvenient in peacetime — and firewalls during incidents.

In the ad delivery pipeline, Cookie Sync is a classic cross-domain matching implementation. It isn't bidding logic itself, but it affects frequency capping, audience matching, retargeting, attribution, and bid decisions.

If you're familiar with OpenRTB, this is straightforward: the SSP or Exchange has its user ID, the DSP has its own. Both can identify users "under their own domain" but can't directly read each other's domain cookies.

When a user visits a publisher's site, the page loads an ad request. The SSP or Exchange can identify this browser under its own domain:

exchange.com -> ex_uid = E123

The DSP might have identified this browser under its own domain too:

dsp.com -> dsp_uid = D789

The problem: browser cookies are isolated by domain:

exchange.com cannot directly read dsp.com's cookies
dsp.com cannot directly read exchange.com's cookies

So neither side can directly say:

E123 is D789

They need browser redirects, pixel requests, 302 redirects — letting the browser visit each domain. Each side reads only its own domain's cookies, but through parameter passing, eventually building a mapping:

exchange_user_id = E123
dsp_user_id      = D789

Afterwards, when the Exchange sends the DSP an OpenRTB bid request, it can include the buyer-side ID when appropriate:

{
  "user": {
    "id": "E123",
    "buyeruid": "D789"
  }
}

A critical point: user.id is typically the supply-side (Exchange or SSP) user ID. buyeruid is closer to the DSP's own user ID.

If the DSP uses user.id directly as its own user ID, that's classic field-name-driven development. The field name looks similar; the semantics are completely different. Frequency capping, attribution, audience matching — all go wrong.

Don't write the match table first and say "we can delete it later." In privacy-related systems, "pollute now, govern later" typically upgrades engineering debt into compliance risk. Tech debt affects maintenance cost; compliance risk affects whether the business can continue using this data at all.

Account Linking Is Also Cross-Domain Matching

Stepping outside advertising, account linking is the same problem.

A user logs in with GitHub:

Your system user_id = U1001
GitHub user_id = G7788

After the user clicks "Sign in with GitHub," you obtain GitHub's user identity via OAuth and establish a binding locally:

provider = github
provider_user_id = G7788
local_user_id = U1001

This is cross-domain matching.

The only difference from Cookie Sync is a more "civilized" flow:

User explicitly clicks login
OAuth provides a standard authorization flow
Provider returns identity information
Local system stores the binding

But the essence hasn't changed.

You can't use GitHub's user ID as the local user ID. You can't assume the same email means the same person. You absolutely can't merge multiple identity sources without user confirmation.

The trickiest part of account linking is "merge strategy."

For example:

Google login email = a@example.com
GitHub login email = a@example.com

Can you auto-merge?

By default, no.

Because whether the email is verified, whether the provider is trusted, whether the user actually controls this email, and whether existing accounts already have data — all of these affect the security boundary. An account system isn't a string join. Merging two accounts wrongly is far worse than not merging.

Not merging means poor UX. Merging wrongly means data unauthorized access.

Both are bad, but at different magnitudes.

This is a critical principle in cross-domain matching:

Once a match affects permissions, assets, or identity, it must be conservative.

A wrong match in advertising might degrade frequency capping and ad performance. A wrong match in accounts might show one person's data to another.

Both are bad. Not in the same league.

Payment Callbacks Are Also Cross-Domain Matching

Payment systems have similar issues.

Your order system has:

order_id = O123
user_id = U1001

The payment channel might have:

payment_id = P999
transaction_id = T888

When creating a payment, you link the local order to the payment channel's order:

local_order_id = O123
provider_payment_id = P999

On payment success, the payment platform calls back via webhook:

{
  "payment_id": "P999",
  "status": "success",
  "amount": 10000
}

You don't just see P999 and credit the user. You need to:

Find the local mapping
Verify signature
Verify amount
Verify currency
Verify order status
Handle idempotently
Then update the local order

This matching isn't user matching — it's order matching. But the underlying pattern is the same:

external_id -> local_id

And these scenarios are even less forgiving than advertising.

A wrong ID match in ads might dirty delivery and attribution data. A wrong ID match in payments affects funds, reconciliation, and audit — completely different risk level.

Conversion Attribution: Even More Subtle

Conversion attribution is also cross-domain matching.

An ad impression happens in one system:

impression_id = I123
user_id = U1
campaign_id = C1

The user later converts on the advertiser's site:

conversion_id = CV999
order_id = O888

The attribution system determines:

Should this conversion be attributed to a prior impression/click?

This matching isn't just ID equality. It also involves:

User ID consistency
Click ID existence
Time window satisfaction
Campaign consistency
Device consistency
Cross-device scenarios
View-through vs. click-through
Multiple ad touchpoints
Attribution model (last click or other rules)

This shows cross-domain matching isn't one-strength-fits-all.

Roughly three categories:

Type	Example	Characteristics
Deterministic	OAuth account linking, payment order mapping	Explicit authorization or strong verification
Semi-deterministic	Cookie Sync, click ID attribution	Relies on ID, time window, context
Probabilistic	Cross-device inference, similar-behavior matching	Has confidence, can't be treated as fact

Probabilistic matching requires particular caution. Cross-device inference, similar-behavior matching — these are fundamentally "guessing," not "confirming." Guessing is useful in ad delivery and growth analytics, but can't be consumed downstream as fact — for example, just because probabilistic matching suggests two devices belong to the same user doesn't mean one party's personal data can be shown to the other. As probabilistic match results propagate downstream, confidence should decay at each hop. The most dangerous practice is storing a 0.6-confidence match in the same field as deterministic matches, letting all downstream consumers assume it's fact.

I lean toward separating deterministic and probabilistic matches in data structure. Don't cram both into one matched_user_id field. Saves a field short-term, destroys traceability long-term.

A clear structure should include:

match_type      = deterministic / probabilistic
match_source    = oauth / cookie_sync / click_id / device_graph
confidence      = 1.0 / 0.82 / ...
matched_at      = ...
expires_at      = ...

The scariest engineering pattern: "a field that looks simple but carries overloaded semantics." One field carrying five meanings, with everyone eventually saying "historical reasons." These historical reasons usually mean the original data modeling didn't express source, strength, and lifecycle clearly.

The Core Model of Cross-Domain Matching

Regardless of scenario, cross-domain matching abstracts to this model:

subject in domain A
        |
        |  evidence / protocol / consent / verification
        v
subject in domain B

In data structure terms:

CREATE TABLE identity_match (
    domain_a          VARCHAR(64)  NOT NULL,
    id_a              VARCHAR(256) NOT NULL,
    domain_b          VARCHAR(64)  NOT NULL,
    id_b              VARCHAR(256) NOT NULL,

    match_type        VARCHAR(32)  NOT NULL,
    match_source      VARCHAR(64)  NOT NULL,
    confidence        DECIMAL(5,4) NULL,

    consent_scope     VARCHAR(128) NULL,
    evidence_id       VARCHAR(256) NULL,

    created_at        TIMESTAMP    NOT NULL,
    updated_at        TIMESTAMP    NOT NULL,
    expires_at        TIMESTAMP    NULL,

    PRIMARY KEY (domain_a, id_a, domain_b, id_b)
);

This doesn't mean every system should build such a universal table. A universal identity graph easily becomes over-engineering, especially when the business hasn't reached that complexity. When is upgrading warranted? A few signals: you're maintaining three or more specialized matching tables simultaneously; cross-scenario matching queries require multi-table joins; or different business lines are independently building inconsistent matching logic. Before these signals appear, specialized mapping tables suffice.

But this model reminds us: a match relationship isn't a bare ID. It carries at minimum source, scope, strength, and lifecycle.

In specific systems, it can be narrowed per scenario:

Advertising matching:

partner_id, partner_user_id, dsp_user_id, source, expires_at

OAuth binding:

provider, provider_user_id, local_user_id, verified_email, bound_at

Payment orders:

provider, provider_payment_id, local_order_id, status, signature_verified

Conversion attribution:

conversion_id, touchpoint_id, match_type, attribution_window, attributed_at

The point isn't making the table design pretty — it's not losing semantics.

When You Shouldn't Match

Not everything that can be matched should be.

Several scenarios warrant explicit refusal:

When dealing with user identity, behavior, advertising, or device identifiers without proper authorization, don't match.

Especially in advertising — just because you technically can send a sync pixel doesn't mean you should. Cookie Sync handles online identifiers; it shouldn't bypass privacy gates.

2. The Match Would Expand Permissions

If a match would let a user see more data, gain more permissions, or access more assets, stronger verification is required.

Account auto-merging, enterprise account linking, payment account binding — none of these can rely on weak signals like email or nickname.

3. Untrusted ID Source

Third-party IDs without signatures, source verification, or allowlisting shouldn't be written directly to the core mapping table.

Open callbacks, open redirects, accepting arbitrary partner parameters — these are incident gateways. Third-party IDs must pass source verification and permission boundary checks before entering the core mapping.

4. Can't Explain the Match Basis

If something goes wrong in production and you can't answer:

Why does A match B?
When was it matched?
Who triggered it?
What's the basis?
Can it be revoked?

Then the matching system is un-operable.

An un-operable system will eventually upgrade "occasional issues" to "all-hands fire drills."

Engineering Principles for Matching Systems

1. IDs Need Namespaces

Don't just store:

user_id = 123

Know whose ID it is:

domain = exchange_a
user_id = 123

Otherwise, when two systems both generate 123, you get a free data crossover experience.

2. Mappings Have Direction

Some mappings are symmetric: A <-> B Some aren't: external_id -> local_id

In payment callbacks, an external payment ID maps to a local order, but a local order can't unconditionally reverse-derive all external states.

Same in advertising. Exchange's user.id mapping to DSP user ID doesn't mean this ID can be reused in other partner scenarios.

3. Matches Have Lifecycles

Cookies expire, tokens expire, consent changes, devices reset, users unlink accounts.

Mapping relationships should expire too.

A never-expiring match table is tempting — so convenient. But convenience usually just stores problems in the future. The future won't thank you; it'll have a table so bloated nobody dares touch it.

4. Matches Must Be Revocable

User unlinks GitHub — the mapping must be deleted or invalidated. User withdraws consent — the ad ID mapping must stop being used. Payment order closes — subsequent success callbacks can't directly change state.

Cross-domain matching isn't only responsible for establishing relationships — it's responsible for dissolving them.

5. The Critical Path Can't Depend on Slow Matching

In low-latency scenarios like RTB, doing cross-database, cross-region, cross-service ID lookups after a bid request arrives is impractical.

Match relationships should be established in advance; the critical path does only low-latency lookups.

This applies beyond advertising. Payment callbacks, login authentication, permission checks — every remote dependency added to the critical path adds a new availability risk. Low-latency critical paths should rely on local indexes, caches, or pre-computed mapping results.

6. Must Have Observability

At minimum, know:

match request volume
match success rate
match failure reasons
privacy block ratio
mapping write success rate
mapping query hit rate
expired cleanup count
partner-dimension anomalies

Without these metrics, cross-domain matching incidents are pure guesswork. Guesswork isn't engineering.

My Current Understanding of Cross-Domain Matching

This field isn't stagnant. Privacy-enhancing matching technologies — Private Set Intersection (PSI), Google PAIR, data clean rooms — are changing matching's technical approach, shifting from "passing IDs around" to "computing intersections within each domain without exposing raw IDs." These solutions aren't silver bullets, but they point in the same direction: matching and privacy aren't opposed, provided the system design honestly addresses authorization, boundaries, and auditability.

Cross-domain matching fundamentally isn't "data synchronization."

It's more like establishing a conditional, explainable, revocable correspondence between two boundaries.

What matters most isn't the ID — it's the boundary.

Without boundaries, matching becomes data commingling. Without sources, matching becomes a black box. Without expiration, matching becomes pollution. Without compliance, matching becomes risk. Without observability, matching becomes mysticism.

Cookie Sync is just this problem's classic manifestation in ad systems. It mashes browser same-origin policy, ad trading, user identification, privacy compliance, low-latency systems, and data modeling together, so it looks complex.

But decomposed, it shares commonalities with many engineering problems:

Who am I?
Who are you?
How do we prove these two identities are related?
Where can this relationship be used?
When does it expire?
When something goes wrong, how do we explain it?

Answer these, and you truly understand cross-domain matching.

Without these answers, stitching together redirect URLs can make the system run. But it runs without knowing when it'll blow.

This article discusses engineering design principles and doesn't replace legal, privacy, or compliance judgments under specific jurisdictions. When dealing with user identity, advertising identifiers, payment assets, and cross-organizational data collaboration, the final solution still requires case-by-case evaluation against business context, contractual constraints, and local regulations.

References

IAB Tech Lab, OpenRTB 2.6 Specification.
IAB Tech Lab, Publisher Advertiser Identity Reconciliation.
Google Ad Manager Help, About PAIR.
IETF Datatracker, Private set intersection based on ECDH.

What Cross-Domain Matching Actually Matches #

Why You Can't Just Share IDs #

Cookie Sync in Advertising Is a Classic Case #

Account Linking Is Also Cross-Domain Matching #

Payment Callbacks Are Also Cross-Domain Matching #

Conversion Attribution: Even More Subtle #

The Core Model of Cross-Domain Matching #

When You Shouldn't Match #

1. No Legal Authorization or User Consent #

2. The Match Would Expand Permissions #

3. Untrusted ID Source #

4. Can't Explain the Match Basis #

Engineering Principles for Matching Systems #

1. IDs Need Namespaces #

2. Mappings Have Direction #

3. Matches Have Lifecycles #

4. Matches Must Be Revocable #

5. The Critical Path Can't Depend on Slow Matching #

6. Must Have Observability #

My Current Understanding of Cross-Domain Matching #

References #