Feature for lazy evaluation #1673

eitsupi · 2023-02-04T12:24:04Z

eitsupi
Feb 4, 2023
Maintainer

Currently in plyground, when we write a query, it is evaluated immediately and the result is returned from DuckDB.
However, when executing a query on huge tables, we may not want the query to be evaluated until you finish writing it.
I feel it would be useful to have the ability to tell the compiler that the query is not complete in such cases.

For example, dplyr evaluates queries for data frames immediately, but queries for different backends (dbplyr, dtplyr, arrow, etc.¹) are not executed until the query is finished at collect() (or compute())².
(Python) Polars have DataFrame and LazyFrame.

max-sixty · 2023-02-04T22:28:27Z

max-sixty
Feb 4, 2023
Maintainer

I agree, though my current mental model of this is that the querying interface sitting between PRQL and the DB could handle this. i.e. the PRQL lang is limited to the description of the data transformation, rather than how or when to evaluate it.

That would let the interface could be really smart — it could do things like in the "Native" description here: #1672. This is also increasingly possible with DuckDB, fast CDWs with memory caching, burstable compute, etc. My mood-affiliation is to have machines handle the non-declarative parts.

That said, possibly a user wants more direct control of the query than that design would offer, and the language could offer that?

My vote would be to leave this open, and see how things evolve with various interfaces and tools. WDYT?

5 replies

eitsupi Feb 5, 2023
Maintainer Author

Thank you for sharing your thoughts.

Sorry I didn't express it well, but my thought was that it would be useful to have a way to pass delayed evaluation options to the system via the query editor, just as we can currently specify the target to compile to in the header (c.f. #1167 (comment)).

And this should be an option, as there should be no value in this option depending on the system receiving the query (including, for example, UIs that have a submit query button, or when the query is evaluated with a specific command, such as the current prql-query, and of course when the interface can handle this well enough, as you say).

max-sixty Feb 5, 2023
Maintainer

My concern is that having the language specify how much of the query should be executed would require fairly deep support in the language itself. For example, if we required collect whenever a result should be evaluated, then that would need to be in every query, which would be quite invasive.

But maybe there are ways of doing it that would be lighter. Do you have any thoughts on what this might look like? (tbc, no problem if not, it's a very sensible suggestion, which we can meditate on!)

eitsupi Feb 5, 2023
Maintainer Author

Yeah, I also think too that requiring something like a collect at the end of every query is not good because it increases the load in cases where we don't need it.

So at the moment I am not sure what method would be best for this, but it may be the ability to explicitly declare the beginning and end of a query, like a shell's here documents.
This would flow from top to bottom so there would be no need to go back to the beginning and re-edit like a header.

max-sixty Feb 5, 2023
Maintainer

Yes, I agree if we could get nice ergonomics, the feature would be good. My current approach is commenting out the lines, which is not ideal!

Let's leave this open and if others have thoughts, they're very welcome...

eitsupi Feb 5, 2023
Maintainer Author

I should have mentioned this in my comment above, but forgot because I don't usually use Chain.jl...

# begin
from employees
select [id, first_name, age]
sort age
take 10
# end

begin and end should not be just a comment, but a special comment or a tag with a string other than a comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature for lazy evaluation #1673

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Feature for lazy evaluation #1673

eitsupi Feb 4, 2023 Maintainer

Footnotes

Replies: 1 comment · 5 replies

max-sixty Feb 4, 2023 Maintainer

eitsupi Feb 5, 2023 Maintainer Author

max-sixty Feb 5, 2023 Maintainer

eitsupi Feb 5, 2023 Maintainer Author

max-sixty Feb 5, 2023 Maintainer

eitsupi Feb 5, 2023 Maintainer Author

eitsupi
Feb 4, 2023
Maintainer

Replies: 1 comment 5 replies

max-sixty
Feb 4, 2023
Maintainer

eitsupi Feb 5, 2023
Maintainer Author

max-sixty Feb 5, 2023
Maintainer

eitsupi Feb 5, 2023
Maintainer Author

max-sixty Feb 5, 2023
Maintainer

eitsupi Feb 5, 2023
Maintainer Author