Skip to content

Conversation

@pkhuong
Copy link

@pkhuong pkhuong commented Jul 18, 2021

https://avi.im/blag/2021/fast-sqlite-inserts/ mentions that even a fully in-memory db takes ~29 seconds to insert 100M rows. Increasing the size of btree nodes, from 4 KB to 64 KB, seems like the next low-hanging fruit to try.

Unfortunately, this change also impacts the cache size: it is specified in pages, so increasing the page size from 4 KB to 64 KB also grows the cache by 16x. Any speed-up from this commit should therefore be compared with a version that sets PRAGMA cache_size = 16000000;.

https://avi.im/blag/2021/fast-sqlite-inserts/ mentions that even a fully in-memory db takes ~29 seconds to insert 100M rows. Increasing the size of btree nodes, from 4 KB to 64 KB, seems like the next low-hanging fruit to try.

Unfortunately, this change also impacts the cache size: it is specified in pages, so increasing the page size from 4 KB to 64 KB also grows the cache by 16x. Any speed-up from this commit should therefore be compared with a version that sets `PRAGMA cache_size = 16000000;`.
@avinassh
Copy link
Owner

@pkhuong could you test, post the before and after numbers?

@lnicola
Copy link

lnicola commented Jun 2, 2024

Hello from the future! I took the time to brush up on the dependency versions and test with PRAGMA page_size = 65536;.

I'm on BTRFS with nodatacow, which doesn't actually help that much. I skipped the Python and SQLx versions because they were too slow.

# baseline
duminică 2 iunie 2024, 13:07:47 +0530 [RUST] basic.rs (100_000_000) inserts

real	2m1,665s
user	1m59,444s
sys	0m1,916s
duminică 2 iunie 2024, 13:09:49 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts

real	1m40,988s
user	1m34,132s
sys	0m6,518s
duminică 2 iunie 2024, 13:11:30 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts

real	1m33,722s
user	1m36,400s
sys	0m7,754s
duminică 2 iunie 2024, 13:13:04 +0530 [RUST] basic_prep.rs (100_000_000) inserts

real	0m38,444s
user	0m36,426s
sys	0m1,928s
duminică 2 iunie 2024, 13:13:43 +0530 [RUST] basic_batched.rs (100_000_000) inserts

real	0m17,025s
user	0m15,053s
sys	0m1,923s
duminică 2 iunie 2024, 13:14:00 +0530 [RUST] threaded_batched.rs (100_000_000) inserts

real	0m16,686s
user	0m15,555s
sys	0m11,237s

# page_size = 65536
duminică 2 iunie 2024, 13:24:31 +0530 [RUST] basic.rs (100_000_000) inserts

real	2m4,589s
user	2m3,121s
sys	0m1,177s
duminică 2 iunie 2024, 13:26:36 +0530 [RUST] basic_batched_wp.rs (100_000_000) inserts

real	1m41,781s
user	1m35,434s
sys	0m6,026s
duminică 2 iunie 2024, 13:28:18 +0530 [RUST] threaded_str_batched.rs (100_000_000) inserts

real	1m34,286s
user	1m37,408s
sys	0m7,274s
duminică 2 iunie 2024, 13:29:52 +0530 [RUST] basic_prep.rs (100_000_000) inserts

real	0m36,846s
user	0m35,585s
sys	0m1,174s
duminică 2 iunie 2024, 13:30:30 +0530 [RUST] basic_batched.rs (100_000_000) inserts

real	0m15,246s
user	0m14,084s
sys	0m1,121s
duminică 2 iunie 2024, 13:30:45 +0530 [RUST] threaded_batched.rs (100_000_000) inserts

real	0m14,880s
user	0m15,084s
sys	0m10,567s

Looks like it helps, but only a little. And please don't use the threaded version (take a look at the sys time).

PS: you can reduce the duplication by using Option<AreaCode>. It implements ToSql so None will be inserted as a NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants