Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON formatted output requires post-processing #1394

Open
nj1973 opened this issue Jan 13, 2025 · 1 comment
Open

JSON formatted output requires post-processing #1394

nj1973 opened this issue Jan 13, 2025 · 1 comment
Labels
type: feature request 'Nice-to-have' improvement, new feature or different behavior or design.

Comments

@nj1973
Copy link
Contributor

nj1973 commented Jan 13, 2025

When requesting JSON output via --format=json the obvious JSON stricture would be include an array containing the validation results. Instead what we get is a map keyed on the row number. For example:

data-validation validate column -sc ora -tc pg  --tables-list=pso_data_validator.dvt_core_types --sum="id" --format=json
{
 "0":{"validation_name":"count","validation_type":"Column","aggregation_type":"count","source_table_name":"pso_data_validator.dvt_core_types","source_column_name":null,"source_agg_value":"3","target_table_name":"pso_data_validator.dvt_core_types","target_column_name":null,"target_agg_value":"3","group_by_columns":null,"primary_keys":null,"num_random_rows":null,"difference":0.0,"pct_difference":0.0,"pct_threshold":0.0,"validation_status":"success","run_id":"3ba9b1ea-9192-4301-aa4d-070260175922","labels":[],"start_time":1736775679472,"end_time":1736775680025},
 "1":{"validation_name":"sum__id","validation_type":"Column","aggregation_type":"sum","source_table_name":"pso_data_validator.dvt_core_types","source_column_name":"id","source_agg_value":"6","target_table_name":"pso_data_validator.dvt_core_types","target_column_name":"id","target_agg_value":"6","group_by_columns":null,"primary_keys":null,"num_random_rows":null,"difference":0.0,"pct_difference":0.0,"pct_threshold":0.0,"validation_status":"success","run_id":"3ba9b1ea-9192-4301-aa4d-070260175922","labels":[],"start_time":1736775679472,"end_time":1736775680025}
}

The customer expected an array as below:

[
{"validation_name":"count","validation_type":"Column","aggregation_type":"count","source_table_name":"pso_data_validator.dvt_core_types","source_column_name":null,"source_agg_value":"3","target_table_name":"pso_data_validator.dvt_core_types","target_column_name":null,"target_agg_value":"3","group_by_columns":null,"primary_keys":null,"num_random_rows":null,"difference":0.0,"pct_difference":0.0,"pct_threshold":0.0,"validation_status":"success","run_id":"3ba9b1ea-9192-4301-aa4d-070260175922","labels":[],"start_time":1736775679472,"end_time":1736775680025},
{"validation_name":"sum__id","validation_type":"Column","aggregation_type":"sum","source_table_name":"pso_data_validator.dvt_core_types","source_column_name":"id","source_agg_value":"6","target_table_name":"pso_data_validator.dvt_core_types","target_column_name":"id","target_agg_value":"6","group_by_columns":null,"primary_keys":null,"num_random_rows":null,"difference":0.0,"pct_difference":0.0,"pct_threshold":0.0,"validation_status":"success","run_id":"3ba9b1ea-9192-4301-aa4d-070260175922","labels":[],"start_time":1736775679472,"end_time":1736775680025}
]
@nj1973
Copy link
Contributor Author

nj1973 commented Jan 13, 2025

A simple workaround while we think about whether this is a good idea or not might be to use Python to post-process the output, for example:

$ data-validation validate column -sc ora_local -tc pg_local   --tables-list=pso_data_validator.dvt_core_types --sum="id" --format=json|python -c 'import json, sys;j = json.load(sys.stdin);print(json.dumps([j[_] for _ in j.keys()]))'
[
{"validation_name": "sum__id", "validation_type": "Column", "aggregation_type": "sum", "source_table_name": "pso_data_validator.dvt_core_types", "source_column_name": "id", "source_agg_value": "6", "target_table_name": "pso_data_validator.dvt_core_types", "target_column_name": "id", "target_agg_value": "6", "group_by_columns": null, "primary_keys": null, "num_random_rows": null, "difference": 0.0, "pct_difference": 0.0, "pct_threshold": 0.0, "validation_status": "success", "run_id": "8603ef14-e393-4a3e-9d51-54c073a94be5", "labels": [], "start_time": 1736776662796, "end_time": 1736776663355},
{"validation_name": "count", "validation_type": "Column", "aggregation_type": "count", "source_table_name": "pso_data_validator.dvt_core_types", "source_column_name": null, "source_agg_value": "3", "target_table_name": "pso_data_validator.dvt_core_types", "target_column_name": null, "target_agg_value": "3", "group_by_columns": null, "primary_keys": null, "num_random_rows": null, "difference": 0.0, "pct_difference": 0.0, "pct_threshold": 0.0, "validation_status": "success", "run_id": "8603ef14-e393-4a3e-9d51-54c073a94be5", "labels": [], "start_time": 1736776662796, "end_time": 1736776663355}
]

This is only valid when we can control the output. If we are running DVT in Cloud Run or GKE then it might not be quite so simple.

@helensilva14 helensilva14 added the type: feature request 'Nice-to-have' improvement, new feature or different behavior or design. label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request 'Nice-to-have' improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants