You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running Elementary tests on a model
dbt test -s foobar --vars "{'elementary_enabled': true}" -t prod
We get an error:
Numeric data overflow (scale float to decimal)
To Reproduce
When running the dbt command above, I tried to localise which queries cause it.
I saw in dbt.log in this query:
[0m09:56:43.769788 [debug] [Thread-2 (]: Ontest.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6: /* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "bigcorp", "target_name": "prod", "node_id": "test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6"} */
with anomaly_scores as (
select
id,
...
when column_name is not null then
'In column '|| column_name ||', the last '|| metric_name ||' value is '|| cast(round(cast(metric_value asnumeric(28,6)), 3) asvarchar(4096)) ||'. The average for this metric is '|| cast(round(cast(training_avg asnumeric(28,6)), 3) asvarchar(4096)) ||'.'
else null
end as anomaly_description
...
This is the problematic part: cast(round(cast(metric_value as numeric(28,6)), 3).
I saw that Elementary runs some inserts and in one of the subqueries, it runs variance(cast("custom_int" as float)).
insert into"prod"."elementary"."test_656a1159e6_elementary_all_columns_anomalies_foobar___metrics__tmp_20250110085236933131"
with tmp_table as (
...
column_monitors as (select
edr_bucket_start as bucket_start,
edr_bucket_end as bucket_end,
datediff(
hour,
edr_bucket_start,
edr_bucket_end
)
as bucket_duration_hours,
cast('custom_int'asvarchar(4096)) as edr_column_name,
coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
as null_count,
round(cast(cast(
coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
as float) / nullif(cast(count(*) as float), 0) *100.0asnumeric(28,6)), 3)
as null_percent,nullas not_null_percent,max(cast("custom_int"as float)) as max,min(cast("custom_int"as float)) as min,avg(cast("custom_int"as float)) as average,
coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int"as float) =0 then 1 else 0 end), 0)
as zero_count,
round(cast(cast(
coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int"as float) =0 then 1 else 0 end), 0)
as float) / nullif(cast(count(*) as float), 0) *100.0asnumeric(28,6)), 3)
as zero_percent,nullas not_zero_percent,stddev(cast("custom_int"as float)) as standard_deviation,variance(cast("custom_int"as float)) as variance,nullas max_length,nullas min_length,nullas average_length,nullas missing_count,nullas missing_percent,nullas count_true,nullas count_false,nullas not_missing_percent,nullas sum
from filtered_monitored_table
left join buckets on (edr_bucket_start = start_bucket_in_data)
group by1,2,3,4
That variance, in our case, yields very a large number:
Describe the bug
When running Elementary tests on a model
We get an error:
To Reproduce
When running the dbt command above, I tried to localise which queries cause it.
I saw in
dbt.log
in this query:This is the problematic part:
cast(round(cast(metric_value as numeric(28,6)), 3)
.I saw that Elementary runs some inserts and in one of the subqueries, it runs
variance(cast("custom_int" as float))
.That
variance
, in our case, yields very a large number:Which cannot be cast to
numeric(28,6))
later.Expected behavior
It should work with larger data types.
Screenshots
N/A
Environment (please complete the following information):
Additional context
NA
Would you be willing to contribute a fix for this issue?
Unlikely, unless it is easy?
The text was updated successfully, but these errors were encountered: