CLI Reference
Command-line interface reference for SQL Identity Resolution runners.
DuckDB CLI
Usage
python sql/duckdb/idr_run.py [OPTIONS]
Options
| Option |
Type |
Default |
Description |
--db |
STRING |
Required |
Path to DuckDB database file |
--run-mode |
ENUM |
INCR |
FULL or INCR |
--max-iters |
INT |
30 |
Max label propagation iterations |
--dry-run |
FLAG |
False |
Preview mode (no commits) |
Examples
# Full run
python sql/duckdb/idr_run.py --db=idr.duckdb --run-mode=FULL
# Incremental run
python sql/duckdb/idr_run.py --db=idr.duckdb --run-mode=INCR
# Dry run
python sql/duckdb/idr_run.py --db=idr.duckdb --run-mode=FULL --dry-run
# Custom max iterations
python sql/duckdb/idr_run.py --db=idr.duckdb --max-iters=50
Exit Codes
| Code |
Meaning |
0 |
Success |
1 |
Error |
BigQuery CLI
Usage
python sql/bigquery/idr_run.py [OPTIONS]
Options
| Option |
Type |
Default |
Description |
--project |
STRING |
Required |
GCP project ID |
--run-mode |
ENUM |
INCR |
FULL or INCR |
--max-iters |
INT |
30 |
Max label propagation iterations |
--dry-run |
FLAG |
False |
Preview mode (no commits) |
Environment Variables
| Variable |
Description |
GOOGLE_APPLICATION_CREDENTIALS |
Path to service account JSON |
Examples
# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Full run
python sql/bigquery/idr_run.py --project=my-project --run-mode=FULL
# Dry run
python sql/bigquery/idr_run.py --project=my-project --dry-run
Snowflake Stored Procedure
Signature
CALL idr_run(
RUN_MODE VARCHAR, -- 'FULL' or 'INCR'
MAX_ITERS INT, -- Max iterations (e.g., 30)
DRY_RUN BOOLEAN -- TRUE = preview only
);
Examples
-- Full run
CALL idr_run('FULL', 30, FALSE);
-- Incremental run
CALL idr_run('INCR', 30, FALSE);
-- Dry run
CALL idr_run('FULL', 30, TRUE);
Return Value
Returns a VARCHAR with run summary:
SUCCESS: run_id=run_abc123, entities=1234, edges=5678, iterations=5, duration=12s
Or for dry runs:
DRY_RUN_COMPLETE: run_id=dry_run_abc123, new_entities=100, moved_entities=50, duration=8s | DRY RUN - NO CHANGES COMMITTED
| Widget |
Type |
Options |
Default |
Description |
RUN_MODE |
dropdown |
INCR, FULL |
INCR |
Processing mode |
MAX_ITERS |
text |
Integer |
30 |
Max iterations |
DRY_RUN |
dropdown |
true, false |
false |
Preview mode |
RUN_ID |
text |
String |
Auto-generated |
Custom run ID |
Programmatic Access
# Read widget values
run_mode = dbutils.widgets.get("RUN_MODE")
dry_run = dbutils.widgets.get("DRY_RUN") == "true"
# Set widget defaults
dbutils.widgets.dropdown("RUN_MODE", "INCR", ["INCR", "FULL"])
dbutils.widgets.dropdown("DRY_RUN", "false", ["true", "false"])
Running via Jobs API
{
"notebook_task": {
"notebook_path": "/Repos/org/repo/sql/databricks/notebooks/IDR_Run",
"base_parameters": {
"RUN_MODE": "INCR",
"DRY_RUN": "false",
"MAX_ITERS": "30"
}
}
}
Metrics Exporter CLI
Usage
python tools/metrics_exporter.py [OPTIONS]
Options
| Option |
Type |
Default |
Description |
--platform |
ENUM |
Required |
duckdb, snowflake, bigquery |
--connection |
STRING |
|
DuckDB: path; Others: connection string |
--exporter |
ENUM |
stdout |
stdout, prometheus, datadog, webhook |
--prometheus-port |
INT |
9090 |
Port for Prometheus metrics |
--datadog-api-key |
STRING |
|
DataDog API key |
--webhook-url |
STRING |
|
Webhook endpoint URL |
--run-id |
STRING |
|
Export specific run only |
Examples
# Print to stdout
python tools/metrics_exporter.py --platform=duckdb --connection=idr.duckdb
# Prometheus endpoint
python tools/metrics_exporter.py --platform=duckdb --connection=idr.duckdb \
--exporter=prometheus --prometheus-port=9090
# DataDog
python tools/metrics_exporter.py --platform=snowflake \
--connection="account=xxx;user=xxx;password=xxx" \
--exporter=datadog --datadog-api-key=$DD_API_KEY
# Webhook
python tools/metrics_exporter.py --platform=bigquery \
--connection="project=my-project" \
--exporter=webhook --webhook-url=https://hooks.slack.com/xxx
Dashboard Generator CLI
Usage
python tools/dashboard/generator.py [OPTIONS]
Options
| Option |
Type |
Default |
Description |
--platform |
ENUM |
Required |
duckdb, snowflake, bigquery, databricks |
--connection |
STRING |
Required |
Platform-specific connection string |
--output |
STRING |
dashboard.html |
Output file path |
--run-id |
STRING |
|
Focus on specific run |
Examples
# Generate from DuckDB
python tools/dashboard/generator.py \
--platform=duckdb \
--connection=idr.duckdb \
--output=dashboard.html
# Open in browser
open dashboard.html
Common Patterns
CI/CD Integration
#!/bin/bash
# ci-run.sh
# Dry run first
python sql/duckdb/idr_run.py --db=idr.duckdb --dry-run
if [ $? -ne 0 ]; then
echo "Dry run failed"
exit 1
fi
# Check for unexpected changes
MOVED=$(duckdb idr.duckdb -c "SELECT COUNT(*) FROM idr_out.dry_run_results WHERE change_type='MOVED'" | tail -1)
if [ "$MOVED" -gt 1000 ]; then
echo "Too many moved entities: $MOVED"
exit 1
fi
# Live run
python sql/duckdb/idr_run.py --db=idr.duckdb
Scheduled Run with Logging
#!/bin/bash
# scheduled-run.sh
LOG_FILE="/var/log/idr/$(date +%Y%m%d_%H%M%S).log"
python sql/duckdb/idr_run.py \
--db=/data/idr.duckdb \
--run-mode=INCR \
2>&1 | tee "$LOG_FILE"
# Check exit code
if [ ${PIPESTATUS[0]} -ne 0 ]; then
# Send alert
curl -X POST https://hooks.slack.com/xxx \
-d "{\"text\": \"IDR run failed. See $LOG_FILE\"}"
fi
Next Steps