Skip to content

Accuracy of memory usage? #84

@etiennebacher

Description

@etiennebacher

Hi, I just found this package, it looks cool and super useful!

One thing that I noted in the benchmarks is how low the memory usage is. I know that using Rust is more efficient in speed and memory usage, but I also think the numbers reported about memory might be inaccurate. From ?profmem:

[...] nearly all memory allocations done in R are logged. Neither memory deallocations nor garbage collection events are logged. Furthermore, allocations done by non-R native libraries or R packages that use native code Calloc() / Free() for internal objects are also not logged.

I suspect that a lot of memory allocations are not done in R but in Rust and that the memory usage is actually higher than reported. I run into the same thing when I benchmark polars and tidypolars, so I'm interested if you find a workaround 😉

Just to give you an example: in polars, when I take the mean of a column with 100_000, 1_000_000, 10_000_000, or 100_000_000 rows, R reports the same (tiny) memory usage but I clearly see a peak in Windows task manager:

library(polars)

bench::press(
  rows = c(1e5, 1e6, 1e7, 1e8),
  {
    dat <- pl$DataFrame(
      a = rnorm(rows),
      b = rnorm(rows),
      c = rnorm(rows)
    )
    bench::mark(
      dat$with_columns(y = pl$col("a")$mean())
    )
  }
)
#> # A tibble: 4 × 7
#>   expression                  rows     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                 <dbl> <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 "dat$with_columns(y = pl$…   1e5 557.8µs  739.5µs   1218.      3.81KB        0
#> 2 "dat$with_columns(y = pl$…   1e6   3.4ms   3.95ms    123.      3.81KB        0
#> 3 "dat$with_columns(y = pl$…   1e7    38ms  42.06ms     14.7     3.81KB        0
#> 4 "dat$with_columns(y = pl$…   1e8   526ms 526.01ms      1.90    3.81KB        0

I was told there's a linux tool to make more accurate benchmarks when calling other languages from R but I don't remember the name, I'll update this post if I find it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions