Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elementary test tries to cast metric_value to numeric(28,6) which fails on our larger datatypes #1778

Open
jan-benisek opened this issue Jan 10, 2025 · 0 comments
Labels
Bug Something isn't working Triage 👀

Comments

@jan-benisek
Copy link

jan-benisek commented Jan 10, 2025

Describe the bug
When running Elementary tests on a model

dbt test -s foobar --vars "{'elementary_enabled': true}" -t prod

We get an error:

Numeric data overflow (scale float to decimal)

To Reproduce
When running the dbt command above, I tried to localise which queries cause it.
I saw in dbt.log in this query:

[0m09:56:43.769788 [debug] [Thread-2 (]: On test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6: /* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "bigcorp", "target_name": "prod", "node_id": "test.bigcorp.elementary_all_columns_anomalies_foobar_.656a1159e6"} */

    with anomaly_scores as (
        select
            id,
...

when column_name is not null then 
    'In column ' || column_name || ', the last ' || metric_name || ' value is ' || cast(round(cast(metric_value as numeric(28,6)), 3) as varchar(4096)) ||
    '. The average for this metric is ' || cast(round(cast(training_avg as numeric(28,6)), 3) as varchar(4096)) || '.'

        else null
    end as anomaly_description
...

This is the problematic part: cast(round(cast(metric_value as numeric(28,6)), 3).

I saw that Elementary runs some inserts and in one of the subqueries, it runs variance(cast("custom_int" as float)).

 insert into "prod"."elementary"."test_656a1159e6_elementary_all_columns_anomalies_foobar___metrics__tmp_20250110085236933131"
        with tmp_table as (

...

column_monitors as (select
                        edr_bucket_start as bucket_start,
                        edr_bucket_end as bucket_end,

    datediff(
        hour,
        edr_bucket_start,
        edr_bucket_end
        )
 as bucket_duration_hours,
                    cast('custom_int' as varchar(4096)) as edr_column_name,

    coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
 as null_count,

      round(cast(cast(
    coalesce(sum(case when "custom_int" is null then 1 else 0 end), 0)
 as float) / nullif(cast(count(*) as float), 0) * 100.0 as numeric(28,6)), 3)

 as null_percent,null  as not_null_percent,max(cast("custom_int" as float)) as max,min(cast("custom_int" as float)) as min,avg(cast("custom_int" as float)) as average,
    coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int" as float) = 0 then 1 else 0 end), 0)
 as zero_count,

      round(cast(cast(
    coalesce(sum(case when "custom_int" is null then 1 when cast("custom_int" as float) = 0 then 1 else 0 end), 0)
 as float) / nullif(cast(count(*) as float), 0) * 100.0 as numeric(28,6)), 3)

 as zero_percent,null  as not_zero_percent,stddev(cast("custom_int" as float)) as standard_deviation,variance(cast("custom_int" as float)) as variance,null  as max_length,null  as min_length,null  as average_length,null  as missing_count,null  as missing_percent,null  as count_true,null  as count_false,null  as not_missing_percent,null  as sum
                from filtered_monitored_table
                    left join buckets on (edr_bucket_start = start_bucket_in_data)

                    group by 1,2,3,4

That variance, in our case, yields very a large number:

select variance(cast("custom_int" as float)) from production.foobart where ts >= '2024-12-31' and ts <= '2025-01-10';
-- 3636013502843636000000

Which cannot be cast to numeric(28,6)) later.

Expected behavior
It should work with larger data types.

Screenshots
N/A

Environment (please complete the following information):

  • Elementary CLI (edr) version: 0.15.1
  • Elementary dbt package version: 0.16.0
  • dbt version you're using: 1.8.6
  • Data warehouse: Redshift
  • Infrastructure details: NA

Additional context
NA

Would you be willing to contribute a fix for this issue?
Unlikely, unless it is easy?

@jan-benisek jan-benisek added Bug Something isn't working Triage 👀 labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Triage 👀
Projects
None yet
Development

No branches or pull requests

1 participant