Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insertBatch to generate multiple INSERT statements #888

Open
samdark opened this issue Oct 17, 2024 · 14 comments
Open

insertBatch to generate multiple INSERT statements #888

samdark opened this issue Oct 17, 2024 · 14 comments
Labels
status:ready for adoption Feel free to implement this issue. type:feature New feature

Comments

@samdark
Copy link
Member

samdark commented Oct 17, 2024

Right now, if the batch is too big, we're getting into limits of DBMS like the following:

image

The screenshot is from PostgreSQL.

I suggest doing two things:

  1. Introduce configurable parameter to configure max number of parameters.
  2. Default it per DBMS. For PostgreSQL it could be 65535.
  3. If max is reached, move the rest of the data to the next INSERT statement.
@samdark samdark added status:ready for adoption Feel free to implement this issue. type:feature New feature labels Oct 17, 2024
@rob006
Copy link
Contributor

rob006 commented Oct 18, 2024

That will change the characteristic of this operation: from atomic to non-atomic.

@samdark
Copy link
Member Author

samdark commented Oct 18, 2024

You can't make it atomic (except transactions) in case of exceeding max number of parameters anyway.

@rob006
Copy link
Contributor

rob006 commented Oct 18, 2024

Yest, but implicit changing atomic transaction to non-atomic is just a massive footgun. This should be explicitly enabled by programmer who makes the call, so developer will know that this batch insert may result multiple queries.

Also, splitting one massive insert to multiple smaller ones is useful not only to handle parameters limit. I generally avoid really big inserts as they may block table for other queries, so system may be more responsive if you do 10 small inserts instead of one big one. Having a method that would do this for you would be useful even if I don't exceed parameters limit in your queries.

@samdark
Copy link
Member Author

samdark commented Oct 18, 2024

Makes sense. Default value of such option must be null then to disable splitting.

@Tigrov
Copy link
Member

Tigrov commented Oct 19, 2024

Good point, but

  1. It is better if it works by default. Currently it does not work.
  2. The documentation should mention this.

By default it is better to specify driver specific max value and allow the developer to change it.

@samdark
Copy link
Member Author

samdark commented Oct 19, 2024

Either way is fine for me, if we'll put a warning about it being atomic or not in the docs.

@rob006
Copy link
Contributor

rob006 commented Oct 19, 2024

By default it is better to specify driver specific max value and allow the developer to change it.

To be clear: I was talking about limit based on number of inserted rows, not number of params used by query. If I want to insert 50k records in 50 queries (1k records per query), then limit I need to pass to the function should 1000.

@samdark
Copy link
Member Author

samdark commented Oct 20, 2024

Yes. That sounds useful as well.

@evil1
Copy link

evil1 commented Dec 24, 2024

If it is driver-specific, shouldn't this be done on DBMS level? In the following packages:

yiisoft/db-mssql
yiisoft/db-mysql
yiisoft/db-oracle
yiisoft/db-pgsql
yiisoft/db-sqlite

@evil1
Copy link

evil1 commented Dec 24, 2024

By default it is better to specify driver specific max value and allow the developer to change it.

To be clear: I was talking about limit based on number of inserted rows, not number of params used by query. If I want to insert 50k records in 50 queries (1k records per query), then limit I need to pass to the function should 1000.

My vision is to introduce a parameter, let's say batchInsertChunkSize and default it to 0 (which means no limit), but if you set it to any non-negative value, then split insert queries in the appropriate chunks.

@samdark
Copy link
Member Author

samdark commented Dec 24, 2024

shouldn't this be done on DBMS level?

Implementation itself — yes. The interface should be common, though.

@evil1
Copy link

evil1 commented Dec 25, 2024

shouldn't this be done on DBMS level?

Implementation itself — yes. The interface should be common, though.

Basically we are discussing here two similar questions at once.
As for the original issue described in the very first message - I think we can incapsulate this logic inside the DBMS package itself.
For PostgreSQL we could add a property int $paramsLimit = 65535;. Same for other PDO Drivers.

In prepareBatchInsertValues(string $table, iterable $rows, array $columns, array &$params, int $chunkSize = 0) in AbstractDMLQueryBuilder where we can make use of this parameter.
And also we can introduce an int $chunkSize = 0 parameter. We will accept this parameter in insertBatch(string $table, iterable $rows, array $columns = [], array &$params = [], int $chunkSize = 0) and pass it here. Method will return array of chunks (By default only one single chunk).

@evil1
Copy link

evil1 commented Dec 26, 2024

This will require insertBatch() to return array of sql queries. Unfortunately it will break some BC.

@evil1
Copy link

evil1 commented Dec 27, 2024

It seem's like there is no easy way, to implement these features.
I've reproduced the bug for pgsql
Снимок экрана 2024-12-27 в 16 09 25

One of the ways is to create method addRawSql() and something like addBindValues(). And store $sql as an array. And iterate this queries on execure() call

    public function batchInsert(string $table, array $columns, iterable $rows): static
    {
        $table = $this->getQueryBuilder()->quoter()->quoteSql($table);

        /** @psalm-var string[] $columns */
        foreach ($columns as &$column) {
            $column = $this->getQueryBuilder()->quoter()->quoteSql($column);
        }

        unset($column);

        $params = [];
        $sql = $this->getQueryBuilder()->batchInsert($table, $columns, $rows, $params);

        $this->setRawSql($sql);
        $this->bindValues($params);

        return $this;
    }

Or maybe think of introducing a DTO. Somethink like BatchInsertChunk which will store $sql and $params in it. Add addBatchInsertChunk() method and store them as an array inside of the Command. And iterate on execure() call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:ready for adoption Feel free to implement this issue. type:feature New feature
Projects
None yet
Development

No branches or pull requests

4 participants