Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert thousands or millions of rows in bulk #343

Open
collimarco opened this issue Sep 6, 2022 · 3 comments
Open

Insert thousands or millions of rows in bulk #343

collimarco opened this issue Sep 6, 2022 · 3 comments

Comments

@collimarco
Copy link

What is the fastest way to batch insert a large number of rows in SQLite using this gem?

Currently I use this:

db.execute('BEGIN TRANSACTION')
sql = 'INSERT INTO items (id, name) VALUES (?, ?)'
items.each_with_index { |name, id| db.execute(sql, [id, name]) }
db.execute('COMMIT')

The items come from a large files and need to be inserted in the SQLite database as fast as possible.

  1. Is there any better / optimized solution?
  2. For example I was considering to keep all the inserts in Ruby in an array and then use execute_batch, however I can't find a way to sanitize / interpolate the statements in this case. Is there a method like db.sanitize(sql, [id, name])?
@collimarco
Copy link
Author

I found this optimization with prepared statements:

db.execute('BEGIN TRANSACTION')
sql = db.prepare('INSERT INTO items (id, name) VALUES (?, ?)')
items.each_with_index { |name, id| sql.execute([id, name]) }
db.execute('COMMIT')

I can reach 250000 inserts/sec on Ruby 3.0 / MacOS Monterey 2,2GHz i7. I was hoping for something more due to the large volume of data that I would like to process in this way.

Is there any better solution?

@flavorjones
Copy link
Member

I don't think there's a better solution using this gem. I'll leave this open in case any of the more seasoned maintainers want to weigh in with advice.

@oldmoe
Copy link

oldmoe commented Sep 12, 2023

Expand the statement to insert multiple rows at once

INSERT INTO items (id, name) VALUES (?, ?), (?, ?), .. , (?, ?);

This ensures you are not going back and forth a lot between SQLite and Ruby

Prepare two statements, the above and one with just a single row insert to capture any leftovers

Make sure your transactions fit in the page cache

Don't just insert all the records in a single transaction, make sure you select a decent size of inserts per transaction (should be thousands) and make sure your page cache can fit all these rows

Use WAL mode with synchronous = 1

If you are not doing so already

Try to insert ids in order

If that is possible with your data set, make the ids ordered before feeding the records to SQLite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants