Data Model
Complete reference for all tables in SQL Identity Resolution.
Schema Overview
erDiagram
source_table ||--o{ identifier_mapping : "has"
source_table ||--o{ run_state : "tracks"
rule ||--o{ identifier_mapping : "applies_to"
source_table {
string table_id PK
string table_fqn
string entity_type
string entity_key_expr
string watermark_column
int watermark_lookback_minutes
boolean is_active
}
rule {
string rule_id PK
string identifier_type
int priority
boolean is_active
int max_group_size
}
identifier_mapping {
string table_id FK
string identifier_type FK
string column_expr
boolean requires_normalization
}
run_state {
string table_id FK
timestamp last_watermark_value
string last_run_id
timestamp last_run_ts
}
identity_resolved_membership_current {
string entity_key PK
string resolved_id
timestamp updated_ts
}
identity_clusters_current {
string resolved_id PK
int cluster_size
timestamp updated_ts
}
idr_meta Schema
source_table
Registry of source tables to process.
| Column | Type | Description |
|---|---|---|
table_id |
VARCHAR | PK Unique identifier for this source |
table_fqn |
VARCHAR | Fully qualified table name |
entity_type |
VARCHAR | Entity type (e.g., PERSON, ACCOUNT) |
entity_key_expr |
VARCHAR | SQL expression for entity key |
watermark_column |
VARCHAR | Column for incremental processing |
watermark_lookback_minutes |
INT | Lookback buffer for late-arriving data |
is_active |
BOOLEAN | Include in processing |
Example:
INSERT INTO idr_meta.source_table VALUES
('customers', 'crm.customers', 'PERSON', 'customer_id', 'updated_at', 0, TRUE),
('orders', 'ecom.orders', 'PERSON', 'user_id', 'order_date', 60, TRUE);
rule
Matching rules defining how identifiers are used.
| Column | Type | Description |
|---|---|---|
rule_id |
VARCHAR | PK Unique rule identifier |
identifier_type |
VARCHAR | Type of identifier (EMAIL, PHONE, etc.) |
priority |
INT | Processing priority (lower = higher priority) |
is_active |
BOOLEAN | Include in processing |
max_group_size |
INT | Max entities per identifier group |
Example:
INSERT INTO idr_meta.rule VALUES
('email_exact', 'EMAIL', 1, TRUE, 10000),
('phone_exact', 'PHONE', 2, TRUE, 5000),
('loyalty_id', 'LOYALTY', 3, TRUE, 1);
identifier_mapping
Maps source columns to identifier types.
| Column | Type | Description |
|---|---|---|
table_id |
VARCHAR | FK → source_table |
identifier_type |
VARCHAR | Type of identifier |
column_expr |
VARCHAR | SQL expression to extract identifier |
requires_normalization |
BOOLEAN | Apply normalization (lowercase, etc.) |
Example:
INSERT INTO idr_meta.identifier_mapping VALUES
('customers', 'EMAIL', 'email', TRUE),
('customers', 'PHONE', 'phone', TRUE),
('orders', 'EMAIL', 'customer_email', TRUE);
run_state
Tracks watermarks for incremental processing.
| Column | Type | Description |
|---|---|---|
table_id |
VARCHAR | FK → source_table |
last_watermark_value |
TIMESTAMP | Last processed watermark |
last_run_id |
VARCHAR | ID of last run |
last_run_ts |
TIMESTAMP | Timestamp of last run |
config
Key-value configuration settings.
| Column | Type | Description |
|---|---|---|
config_key |
VARCHAR | PK Configuration key |
config_value |
VARCHAR | Configuration value |
description |
VARCHAR | Human-readable description |
updated_at |
TIMESTAMP | Last update time |
Default values:
INSERT INTO idr_meta.config VALUES
('dry_run_retention_days', '7', 'Days to retain dry run results', NOW()),
('large_cluster_threshold', '5000', 'Threshold for large cluster warnings', NOW());
identifier_exclusion
Values to exclude from matching.
| Column | Type | Description |
|---|---|---|
identifier_type |
VARCHAR | Type of identifier |
identifier_pattern |
VARCHAR | Value or pattern to exclude |
is_pattern |
BOOLEAN | TRUE if LIKE pattern |
reason |
VARCHAR | Reason for exclusion |
Example:
INSERT INTO idr_meta.identifier_exclusion VALUES
('EMAIL', 'test@test.com', FALSE, 'Generic test email'),
('EMAIL', '%@example.com', TRUE, 'Example domain'),
('PHONE', '0000000000', FALSE, 'Invalid phone');
idr_out Schema
identity_resolved_membership_current
Maps each entity to its resolved identity.
| Column | Type | Description |
|---|---|---|
entity_key |
VARCHAR | PK Unique entity identifier |
resolved_id |
VARCHAR | Cluster identifier (= entity_key of cluster anchor) |
updated_ts |
TIMESTAMP | Last update time |
identity_clusters_current
Cluster metadata.
| Column | Type | Description |
|---|---|---|
resolved_id |
VARCHAR | PK Cluster identifier |
cluster_size |
INT | Number of entities in cluster |
updated_ts |
TIMESTAMP | Last update time |
golden_profile_current
Best-record golden profiles per cluster.
| Column | Type | Description |
|---|---|---|
resolved_id |
VARCHAR | PK Cluster identifier |
email_primary |
VARCHAR | Best email |
phone_primary |
VARCHAR | Best phone |
first_name |
VARCHAR | Best first name |
last_name |
VARCHAR | Best last name |
updated_ts |
TIMESTAMP | Last update time |
run_history
Audit log of all runs.
| Column | Type | Description |
|---|---|---|
run_id |
VARCHAR | PK Unique run identifier |
run_mode |
VARCHAR | FULL or INCR |
started_at |
TIMESTAMP | Run start time |
ended_at |
TIMESTAMP | Run end time |
status |
VARCHAR | SUCCESS, SUCCESS_WITH_WARNINGS, FAILED, DRY_RUN_COMPLETE |
entities_processed |
INT | Number of entities processed |
edges_created |
INT | Number of edges created |
clusters_impacted |
INT | Number of clusters affected |
lp_iterations |
INT | Label propagation iterations |
duration_seconds |
INT | Total duration |
groups_skipped |
INT | Groups skipped (max_group_size) |
values_excluded |
INT | Values excluded (exclusion list) |
large_clusters |
INT | Clusters above threshold |
warnings |
VARCHAR | JSON array of warnings |
dry_run_results
Per-entity changes from dry run.
| Column | Type | Description |
|---|---|---|
run_id |
VARCHAR | Dry run identifier |
entity_key |
VARCHAR | Entity identifier |
current_resolved_id |
VARCHAR | Current cluster (NULL if new) |
proposed_resolved_id |
VARCHAR | Proposed cluster |
change_type |
VARCHAR | NEW, MOVED, UNCHANGED |
current_cluster_size |
INT | Current cluster size |
proposed_cluster_size |
INT | Proposed cluster size |
created_at |
TIMESTAMP | Record creation time |
dry_run_summary
Aggregate dry run statistics.
| Column | Type | Description |
|---|---|---|
run_id |
VARCHAR | PK Dry run identifier |
total_entities |
INT | Total entities analyzed |
new_entities |
INT | Entities with NEW change type |
moved_entities |
INT | Entities with MOVED change type |
unchanged_entities |
INT | Entities with UNCHANGED change type |
merged_clusters |
INT | Clusters that would merge |
split_clusters |
INT | Clusters that would split |
largest_proposed_cluster |
INT | Size of largest proposed cluster |
edges_would_create |
INT | Edges that would be created |
groups_would_skip |
INT | Groups that would be skipped |
values_would_exclude |
INT | Values that would be excluded |
execution_time_seconds |
INT | Processing time |
created_at |
TIMESTAMP | Record creation time |
metrics_export
Exportable metrics for monitoring.
| Column | Type | Description |
|---|---|---|
metric_id |
VARCHAR | PK Unique metric ID |
run_id |
VARCHAR | Associated run ID |
metric_name |
VARCHAR | Metric name |
metric_value |
DOUBLE | Metric value |
metric_type |
VARCHAR | gauge, counter |
dimensions |
VARCHAR | JSON dimensions |
recorded_at |
TIMESTAMP | Recording time |
exported_at |
TIMESTAMP | Export time (NULL if not exported) |
skipped_identifier_groups
Audit log of skipped identifier groups.
| Column | Type | Description |
|---|---|---|
run_id |
VARCHAR | Run identifier |
identifier_type |
VARCHAR | Type of identifier |
identifier_value_norm |
VARCHAR | Normalized identifier value |
group_size |
INT | Size of skipped group |
max_allowed |
INT | Configured max_group_size |
sample_entity_keys |
VARCHAR | Sample of affected entity keys |
reason |
VARCHAR | Reason for skip |
skipped_at |
TIMESTAMP | Skip time |
Next Steps
- Configuration Guide
- Architecture
- Schema Reference - Full DDL