Troubleshooting
Common issues and solutions for SQL Identity Resolution.
Run Failures
"No active source tables found"
Symptom: Run fails immediately with this error.
Cause: No tables registered or all disabled.
Solution:
-- Check what's registered
SELECT * FROM idr_meta.source_table;
-- Ensure at least one is active
UPDATE idr_meta.source_table SET is_active = TRUE WHERE table_id = 'customers';
"Source table does not exist"
Symptom: Run fails during entity extraction.
Cause: table_fqn in source_table doesn't match actual table.
Solution:
-- Verify table exists
SELECT * FROM information_schema.tables
WHERE table_schema = 'crm' AND table_name = 'customers';
-- Update if incorrect
UPDATE idr_meta.source_table
SET table_fqn = 'crm.customers'
WHERE table_id = 'customers';
"No edges created"
Symptom: Run completes but no matching happens.
Causes: 1. No identifier mappings defined 2. All identifiers are NULL 3. All groups exceed max_group_size
Diagnosis:
-- Check mappings exist
SELECT * FROM idr_meta.identifier_mapping;
-- Check if identifiers extracted
SELECT COUNT(*) FROM idr_work.identifiers;
-- Check if groups were skipped
SELECT COUNT(*) FROM idr_out.skipped_identifier_groups
WHERE run_id = 'run_xyz';
Performance Issues
Run takes too long
Causes: 1. Too many iterations 2. Large data volume 3. Missing indexes
Solutions:
-- Add indexes on source tables
CREATE INDEX idx_updated ON customers(updated_at);
-- Use incremental mode
-- (after first FULL run)
Out of memory
Causes: 1. Giant clusters 2. Too much data in single run
Solutions:
-- Lower max_group_size
UPDATE idr_meta.rule SET max_group_size = 5000;
-- Process in batches (reduce lookback)
UPDATE idr_meta.source_table
SET watermark_lookback_minutes = 0;
Data Quality Issues
Giant cluster forming
Symptom: One cluster has thousands of entities.
Diagnosis:
-- Find the giant cluster
SELECT resolved_id, cluster_size
FROM idr_out.identity_clusters_current
ORDER BY cluster_size DESC
LIMIT 5;
-- Find what's connecting them
SELECT
i.identifier_type,
i.identifier_value_norm,
COUNT(DISTINCT i.entity_key) as entity_count
FROM idr_out.identity_resolved_membership_current m
JOIN idr_work.identifiers i ON m.entity_key = i.entity_key
WHERE m.resolved_id = 'giant_cluster_id'
GROUP BY i.identifier_type, i.identifier_value_norm
ORDER BY entity_count DESC;
Solution: 1. Add the problematic identifier to exclusion list 2. Lower max_group_size for that identifier type
Entities not matching
Symptom: Entities with same identifier are in different clusters.
Diagnosis:
-- Check if identifiers match
SELECT entity_key, identifier_value_norm
FROM idr_work.identifiers
WHERE entity_key IN ('entity_a', 'entity_b');
-- Check normalization
-- Are they exactly equal?
Causes: 1. Normalization issues (different whitespace, case) 2. Character encoding differences 3. max_group_size exceeded
Wrong entities matching
Symptom: Unrelated entities are clustered together.
Diagnosis:
-- Find the connecting path
WITH cluster_members AS (
SELECT entity_key
FROM idr_out.identity_resolved_membership_current
WHERE resolved_id = 'problematic_cluster'
)
SELECT
i.entity_key,
i.identifier_type,
i.identifier_value_norm
FROM idr_work.identifiers i
JOIN cluster_members c ON i.entity_key = c.entity_key
ORDER BY i.identifier_value_norm, i.entity_key;
Solution: 1. Identify the bad identifier 2. Add to exclusion list 3. Lower max_group_size
Platform-Specific Issues
DuckDB
"database is locked"
Snowflake
"Warehouse is suspended"
"Insufficient privileges"
-- Grant required permissions
GRANT USAGE ON WAREHOUSE compute_wh TO ROLE idr_executor;
GRANT ALL ON SCHEMA idr_meta TO ROLE idr_executor;
BigQuery
"Quota exceeded"
"Dataset not found"
Databricks
"Cluster terminated"
Dry Run Issues
"dry_run_results is empty"
Cause: No changes would be made.
Diagnosis:
-- Check if any entities processed
SELECT COUNT(*) FROM idr_work.entities_delta;
-- Check if any edges created
SELECT COUNT(*) FROM idr_work.edges_new;
Dry run shows unexpected changes
Diagnosis:
-- Review moved entities
SELECT *
FROM idr_out.dry_run_results
WHERE run_id = 'dry_run_xyz'
AND change_type = 'MOVED';
-- Check what identifier caused the move
-- (cross-reference with edges_new)
Recovery Procedures
Rollback a bad run
-- 1. Identify the bad run
SELECT * FROM idr_out.run_history ORDER BY started_at DESC;
-- 2. Note the previous good watermark
SELECT * FROM idr_meta.run_state;
-- 3. If you have backups, restore from them
-- 4. Otherwise, re-run in FULL mode after fixing config
Reset to clean state
This deletes all output data
Use only if you want to start from scratch.
-- Reset output tables
TRUNCATE TABLE idr_out.identity_resolved_membership_current;
TRUNCATE TABLE idr_out.identity_clusters_current;
TRUNCATE TABLE idr_out.golden_profile_current;
-- Reset watermarks
UPDATE idr_meta.run_state
SET last_watermark_value = '1900-01-01'::TIMESTAMP;
Getting Help
- Check run history:
SELECT * FROM idr_out.run_history ORDER BY started_at DESC - Check warnings: Look at the
warningscolumn - Check skipped groups:
SELECT * FROM idr_out.skipped_identifier_groups - Enable verbose logging: Platform-specific debug options
- Open an issue: GitHub Issues