Data Model

Complete reference for all tables in SQL Identity Resolution.

Schema Overview

erDiagram
    source_table ||--o{ identifier_mapping : "has"
    source_table ||--o{ run_state : "tracks"
    rule ||--o{ identifier_mapping : "applies_to"

    source_table {
        string table_id PK
        string table_fqn
        string entity_type
        string entity_key_expr
        string watermark_column
        int watermark_lookback_minutes
        boolean is_active
    }

    rule {
        string rule_id PK
        string identifier_type
        int priority
        boolean is_active
        int max_group_size
    }

    identifier_mapping {
        string table_id FK
        string identifier_type FK
        string column_expr
        boolean requires_normalization
    }

    run_state {
        string table_id FK
        timestamp last_watermark_value
        string last_run_id
        timestamp last_run_ts
    }

    identity_resolved_membership_current {
        string entity_key PK
        string resolved_id
        timestamp updated_ts
    }

    identity_clusters_current {
        string resolved_id PK
        int cluster_size
        timestamp updated_ts
    }

idr_meta Schema

source_table

Registry of source tables to process.

Column	Type	Description
`table_id`	VARCHAR	PK Unique identifier for this source
`table_fqn`	VARCHAR	Fully qualified table name
`entity_type`	VARCHAR	Entity type (e.g., PERSON, ACCOUNT)
`entity_key_expr`	VARCHAR	SQL expression for entity key
`watermark_column`	VARCHAR	Column for incremental processing
`watermark_lookback_minutes`	INT	Lookback buffer for late-arriving data
`is_active`	BOOLEAN	Include in processing

Example:

INSERT INTO idr_meta.source_table VALUES
  ('customers', 'crm.customers', 'PERSON', 'customer_id', 'updated_at', 0, TRUE),
  ('orders', 'ecom.orders', 'PERSON', 'user_id', 'order_date', 60, TRUE);

rule

Matching rules defining how identifiers are used.

Column	Type	Description
`rule_id`	VARCHAR	PK Unique rule identifier
`identifier_type`	VARCHAR	Type of identifier (EMAIL, PHONE, etc.)
`priority`	INT	Processing priority (lower = higher priority)
`is_active`	BOOLEAN	Include in processing
`max_group_size`	INT	Max entities per identifier group

Example:

INSERT INTO idr_meta.rule VALUES
  ('email_exact', 'EMAIL', 1, TRUE, 10000),
  ('phone_exact', 'PHONE', 2, TRUE, 5000),
  ('loyalty_id', 'LOYALTY', 3, TRUE, 1);

identifier_mapping

Maps source columns to identifier types.

Column	Type	Description
`table_id`	VARCHAR	FK → source_table
`identifier_type`	VARCHAR	Type of identifier
`column_expr`	VARCHAR	SQL expression to extract identifier
`requires_normalization`	BOOLEAN	Apply normalization (lowercase, etc.)

Example:

INSERT INTO idr_meta.identifier_mapping VALUES
  ('customers', 'EMAIL', 'email', TRUE),
  ('customers', 'PHONE', 'phone', TRUE),
  ('orders', 'EMAIL', 'customer_email', TRUE);

run_state

Tracks watermarks for incremental processing.

Column	Type	Description
`table_id`	VARCHAR	FK → source_table
`last_watermark_value`	TIMESTAMP	Last processed watermark
`last_run_id`	VARCHAR	ID of last run
`last_run_ts`	TIMESTAMP	Timestamp of last run

config

Key-value configuration settings.

Column	Type	Description
`config_key`	VARCHAR	PK Configuration key
`config_value`	VARCHAR	Configuration value
`description`	VARCHAR	Human-readable description
`updated_at`	TIMESTAMP	Last update time

Default values:

INSERT INTO idr_meta.config VALUES
  ('dry_run_retention_days', '7', 'Days to retain dry run results', NOW()),
  ('large_cluster_threshold', '5000', 'Threshold for large cluster warnings', NOW());

identifier_exclusion

Values to exclude from matching.

Column	Type	Description
`identifier_type`	VARCHAR	Type of identifier
`identifier_pattern`	VARCHAR	Value or pattern to exclude
`is_pattern`	BOOLEAN	TRUE if LIKE pattern
`reason`	VARCHAR	Reason for exclusion

Example:

INSERT INTO idr_meta.identifier_exclusion VALUES
  ('EMAIL', 'test@test.com', FALSE, 'Generic test email'),
  ('EMAIL', '%@example.com', TRUE, 'Example domain'),
  ('PHONE', '0000000000', FALSE, 'Invalid phone');

idr_out Schema

identity_resolved_membership_current

Maps each entity to its resolved identity.

Column	Type	Description
`entity_key`	VARCHAR	PK Unique entity identifier
`resolved_id`	VARCHAR	Cluster identifier (= entity_key of cluster anchor)
`updated_ts`	TIMESTAMP	Last update time

identity_clusters_current

Cluster metadata.

Column	Type	Description
`resolved_id`	VARCHAR	PK Cluster identifier
`cluster_size`	INT	Number of entities in cluster
`updated_ts`	TIMESTAMP	Last update time

golden_profile_current

Best-record golden profiles per cluster.

Column	Type	Description
`resolved_id`	VARCHAR	PK Cluster identifier
`email_primary`	VARCHAR	Best email
`phone_primary`	VARCHAR	Best phone
`first_name`	VARCHAR	Best first name
`last_name`	VARCHAR	Best last name
`updated_ts`	TIMESTAMP	Last update time

run_history

Audit log of all runs.

Column	Type	Description
`run_id`	VARCHAR	PK Unique run identifier
`run_mode`	VARCHAR	FULL or INCR
`started_at`	TIMESTAMP	Run start time
`ended_at`	TIMESTAMP	Run end time
`status`	VARCHAR	SUCCESS, SUCCESS_WITH_WARNINGS, FAILED, DRY_RUN_COMPLETE
`entities_processed`	INT	Number of entities processed
`edges_created`	INT	Number of edges created
`clusters_impacted`	INT	Number of clusters affected
`lp_iterations`	INT	Label propagation iterations
`duration_seconds`	INT	Total duration
`groups_skipped`	INT	Groups skipped (max_group_size)
`values_excluded`	INT	Values excluded (exclusion list)
`large_clusters`	INT	Clusters above threshold
`warnings`	VARCHAR	JSON array of warnings

dry_run_results

Per-entity changes from dry run.

Column	Type	Description
`run_id`	VARCHAR	Dry run identifier
`entity_key`	VARCHAR	Entity identifier
`current_resolved_id`	VARCHAR	Current cluster (NULL if new)
`proposed_resolved_id`	VARCHAR	Proposed cluster
`change_type`	VARCHAR	NEW, MOVED, UNCHANGED
`current_cluster_size`	INT	Current cluster size
`proposed_cluster_size`	INT	Proposed cluster size
`created_at`	TIMESTAMP	Record creation time

dry_run_summary

Aggregate dry run statistics.

Column	Type	Description
`run_id`	VARCHAR	PK Dry run identifier
`total_entities`	INT	Total entities analyzed
`new_entities`	INT	Entities with NEW change type
`moved_entities`	INT	Entities with MOVED change type
`unchanged_entities`	INT	Entities with UNCHANGED change type
`merged_clusters`	INT	Clusters that would merge
`split_clusters`	INT	Clusters that would split
`largest_proposed_cluster`	INT	Size of largest proposed cluster
`edges_would_create`	INT	Edges that would be created
`groups_would_skip`	INT	Groups that would be skipped
`values_would_exclude`	INT	Values that would be excluded
`execution_time_seconds`	INT	Processing time
`created_at`	TIMESTAMP	Record creation time

metrics_export

Exportable metrics for monitoring.

Column	Type	Description
`metric_id`	VARCHAR	PK Unique metric ID
`run_id`	VARCHAR	Associated run ID
`metric_name`	VARCHAR	Metric name
`metric_value`	DOUBLE	Metric value
`metric_type`	VARCHAR	gauge, counter
`dimensions`	VARCHAR	JSON dimensions
`recorded_at`	TIMESTAMP	Recording time
`exported_at`	TIMESTAMP	Export time (NULL if not exported)

skipped_identifier_groups

Audit log of skipped identifier groups.

Column	Type	Description
`run_id`	VARCHAR	Run identifier
`identifier_type`	VARCHAR	Type of identifier
`identifier_value_norm`	VARCHAR	Normalized identifier value
`group_size`	INT	Size of skipped group
`max_allowed`	INT	Configured max_group_size
`sample_entity_keys`	VARCHAR	Sample of affected entity keys
`reason`	VARCHAR	Reason for skip
`skipped_at`	TIMESTAMP	Skip time

Data Model

Schema Overview

idr_meta Schema

source_table

rule

identifier_mapping

run_state

config

identifier_exclusion

idr_out Schema

identity_resolved_membership_current

identity_clusters_current

golden_profile_current

run_history

dry_run_results

dry_run_summary

metrics_export

skipped_identifier_groups

Next Steps