Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Physical plan round trip fails in some cases #14679

Open
Tracked by #14123
milenkovicm opened this issue Feb 15, 2025 · 2 comments · May be fixed by #14685
Open
Tracked by #14123

bug: Physical plan round trip fails in some cases #14679

milenkovicm opened this issue Feb 15, 2025 · 2 comments · May be fixed by #14685
Labels
bug Something isn't working

Comments

@milenkovicm
Copy link
Contributor

milenkovicm commented Feb 15, 2025

Describe the bug

While testing ballista builds with latest main I've noticed tests failing with:

Error: Internal("Could not create `ExprBoundaries`: in `try_from_column` `col_index` \n                has gone out of bounds with a value of 3, the schema has 3 columns.")

This was not the case with df 45, nor there is a problem if remote context is replaced with datafusion context.

To Reproduce

Apparently difference between datafusion and ballista execution is logical and physical plan plans serde. After looking at the wrong place (logical plan) I've managed to reproduced it with:

        let ctx = SessionContext::new();
        ctx.register_parquet(
            "test",
            "alltypes_plain.parquet",
            Default::default(),
        )
        .await?;

        let plan = ctx
            .sql("select string_col, timestamp_col from test where id > 4")
            .await?
            .create_physical_plan()
            .await?;

        let node: PhysicalPlanNode = PhysicalPlanNode::try_from_physical_plan(
            plan,
            &DefaultPhysicalExtensionCodec {},
        )?;
        // fails here
        let plan = node.try_into_physical_plan(
            &ctx,
            &ctx.runtime_env(),
            &DefaultPhysicalExtensionCodec {},
        )?;

        let _ = plan.execute(0, ctx.task_ctx()).unwrap();

where parquet file can be found at https://github.com/apache/datafusion-ballista/blob/46a67459e61467a2e86c23f0c1c2920dd49c877f/ballista/client/testdata/alltypes_plain.parquet

datafusion commit used for testing a104661

(for what its worth, this issue is there 15 - 16 commits in the past)

note that queries, will execute without any problems:

  • select string_col, timestamp_col from test
  • select * from test where id > 4

query will execute without problems without plan serde

additional info:

  • csv does not have this issue

Expected behavior

round trip to be successful

Additional context

#14631

@milenkovicm milenkovicm added the bug Something isn't working label Feb 15, 2025
@milenkovicm
Copy link
Contributor Author

I've managed to reproduce issue I've seen last evening @alamb

@milenkovicm
Copy link
Contributor Author

update, commit 5e1e693 #14224 breaks physical plan serde

fyi @mertak-synnada

@milenkovicm milenkovicm changed the title Physical plan round trip fails in some cases bug: Physical plan round trip fails in some cases Feb 15, 2025
@blaginin blaginin linked a pull request Feb 15, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant