Security
Security best practices for SQL Identity Resolution deployments.
Principle of Least Privilege
Grant only the minimum permissions needed for the IDR runner.
DuckDB
DuckDB runs locally with file permissions:
# Restrict file access
chmod 600 idr.duckdb
# Run with non-root user
useradd -r -s /bin/false idr-runner
chown idr-runner:idr-runner idr.duckdb
su - idr-runner -c "python sql/duckdb/idr_run.py --db=idr.duckdb"
Snowflake
-- Create dedicated role
CREATE ROLE IDR_EXECUTOR;
-- Grant only required permissions
GRANT USAGE ON WAREHOUSE compute_wh TO ROLE IDR_EXECUTOR;
GRANT USAGE ON DATABASE analytics TO ROLE IDR_EXECUTOR;
-- Read-only on source schemas
GRANT USAGE ON SCHEMA crm TO ROLE IDR_EXECUTOR;
GRANT SELECT ON ALL TABLES IN SCHEMA crm TO ROLE IDR_EXECUTOR;
GRANT SELECT ON FUTURE TABLES IN SCHEMA crm TO ROLE IDR_EXECUTOR;
-- Full access on IDR schemas
GRANT ALL ON SCHEMA idr_meta TO ROLE IDR_EXECUTOR;
GRANT ALL ON SCHEMA idr_work TO ROLE IDR_EXECUTOR;
GRANT ALL ON SCHEMA idr_out TO ROLE IDR_EXECUTOR;
GRANT ALL ON ALL TABLES IN SCHEMA idr_meta TO ROLE IDR_EXECUTOR;
GRANT ALL ON ALL TABLES IN SCHEMA idr_work TO ROLE IDR_EXECUTOR;
GRANT ALL ON ALL TABLES IN SCHEMA idr_out TO ROLE IDR_EXECUTOR;
-- Create service user
CREATE USER idr_service_account
PASSWORD = 'CHANGE_ME'
DEFAULT_ROLE = IDR_EXECUTOR
DEFAULT_WAREHOUSE = compute_wh;
GRANT ROLE IDR_EXECUTOR TO USER idr_service_account;
BigQuery
# Create service account
gcloud iam service-accounts create idr-runner \
--display-name="IDR Runner Service Account"
# Grant minimal permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:idr-runner@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
# Grant dataset-level access
bq query --use_legacy_sql=false "
GRANT \`roles/bigquery.dataViewer\` ON SCHEMA crm
TO 'serviceAccount:idr-runner@PROJECT_ID.iam.gserviceaccount.com'
"
bq query --use_legacy_sql=false "
GRANT \`roles/bigquery.dataEditor\` ON SCHEMA idr_meta
TO 'serviceAccount:idr-runner@PROJECT_ID.iam.gserviceaccount.com'
"
bq query --use_legacy_sql=false "
GRANT \`roles/bigquery.dataEditor\` ON SCHEMA idr_out
TO 'serviceAccount:idr-runner@PROJECT_ID.iam.gserviceaccount.com'
"
Databricks
# Unity Catalog permissions
spark.sql("""
GRANT USAGE ON CATALOG main TO `idr-runner-group`
""")
spark.sql("""
GRANT SELECT ON SCHEMA main.crm TO `idr-runner-group`
""")
spark.sql("""
GRANT ALL PRIVILEGES ON SCHEMA main.idr_meta TO `idr-runner-group`
""")
spark.sql("""
GRANT ALL PRIVILEGES ON SCHEMA main.idr_out TO `idr-runner-group`
""")
Secrets Management
Never Hardcode Credentials
Environment Variables
# .env (never commit this file!)
SNOWFLAKE_ACCOUNT=xxx
SNOWFLAKE_USER=idr_service_account
SNOWFLAKE_PASSWORD=xxx
# Add to .gitignore
echo ".env" >> .gitignore
echo "*.pem" >> .gitignore
echo "*.json" >> .gitignore # For service account keys
Cloud Secret Managers
Data Protection
PII Handling
Identity resolution inherently deals with PII. Protect it:
- Encrypt at rest: Enable encryption on all databases/storage
- Encrypt in transit: Use TLS/SSL connections
- Mask in logs: Never log raw identifier values
- Limit retention: Delete old dry run data
-- Clean up old dry run data (contains PII)
DELETE FROM idr_out.dry_run_results
WHERE created_at < CURRENT_TIMESTAMP - INTERVAL '7 days';
Column-Level Encryption
If needed, encrypt sensitive columns:
-- Snowflake example
ALTER TABLE idr_out.golden_profile_current MODIFY COLUMN email_primary
SET MASKING POLICY email_mask;
CREATE MASKING POLICY email_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'IDR_EXECUTOR') THEN val
ELSE '***MASKED***'
END;
Network Security
Private Endpoints
- Use AWS PrivateLink or Azure Private Link
- Restrict network policies
- Use VPC Service Controls
- Configure Private Google Access
- Deploy in private subnet
- Use Private Link
Firewall Rules
-- Snowflake Network Policy
CREATE NETWORK POLICY idr_policy
ALLOWED_IP_LIST = ('10.0.0.0/8', '192.168.0.0/16')
BLOCKED_IP_LIST = ('0.0.0.0/0');
ALTER USER idr_service_account SET NETWORK_POLICY = idr_policy;
Audit Logging
Enable Platform Audit Logs
- Enable audit logging in Admin Console
- Logs go to cloud storage
Application-Level Logging
The IDR runner logs all activity to idr_out.run_history:
SELECT
run_id,
run_mode,
started_at,
ended_at,
status,
entities_processed
FROM idr_out.run_history
ORDER BY started_at DESC;
Compliance
GDPR
- Right to be forgotten: Delete entity from source tables, run FULL mode to remove from clusters
- Data portability: Query membership table for entity's cluster info
- Purpose limitation: Only use IDR for stated purposes
CCPA
- Similar to GDPR requirements
- Maintain audit trail of processing
SOC 2
- Enable all audit logging
- Implement access controls
- Document procedures
Security Checklist
Pre-Deployment
- [ ] Service accounts created with minimal permissions
- [ ] Secrets stored in secret manager (not code)
- [ ] Network policies configured
- [ ] Encryption enabled (at rest and in transit)
- [ ] Audit logging enabled
Operations
- [ ] Regular credential rotation (90 days)
- [ ] Review access logs monthly
- [ ] Clean up old dry run data
- [ ] Monitor for unusual activity
Incident Response
- [ ] Document data breach procedures
- [ ] Test rollback procedures
- [ ] Maintain contact list for security incidents