-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi, I just found this package, it looks cool and super useful!
One thing that I noted in the benchmarks is how low the memory usage is. I know that using Rust is more efficient in speed and memory usage, but I also think the numbers reported about memory might be inaccurate. From ?profmem:
[...] nearly all memory allocations done in R are logged. Neither memory deallocations nor garbage collection events are logged. Furthermore, allocations done by non-R native libraries or R packages that use native code Calloc() / Free() for internal objects are also not logged.
I suspect that a lot of memory allocations are not done in R but in Rust and that the memory usage is actually higher than reported. I run into the same thing when I benchmark polars and tidypolars, so I'm interested if you find a workaround 😉
Just to give you an example: in polars, when I take the mean of a column with 100_000, 1_000_000, 10_000_000, or 100_000_000 rows, R reports the same (tiny) memory usage but I clearly see a peak in Windows task manager:
library(polars)
bench::press(
rows = c(1e5, 1e6, 1e7, 1e8),
{
dat <- pl$DataFrame(
a = rnorm(rows),
b = rnorm(rows),
c = rnorm(rows)
)
bench::mark(
dat$with_columns(y = pl$col("a")$mean())
)
}
)
#> # A tibble: 4 × 7
#> expression rows min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 "dat$with_columns(y = pl$… 1e5 557.8µs 739.5µs 1218. 3.81KB 0
#> 2 "dat$with_columns(y = pl$… 1e6 3.4ms 3.95ms 123. 3.81KB 0
#> 3 "dat$with_columns(y = pl$… 1e7 38ms 42.06ms 14.7 3.81KB 0
#> 4 "dat$with_columns(y = pl$… 1e8 526ms 526.01ms 1.90 3.81KB 0I was told there's a linux tool to make more accurate benchmarks when calling other languages from R but I don't remember the name, I'll update this post if I find it.