Skip to content

Speed of s3 downloads #150

@bart1

Description

@bart1

I was excited to see that the s3 methods are implemented in the package. However when I try them out there seems to be a considerable speed difference (factor 10) compared to first downloading the file. Admittedly I have a reasonably fast internet connection. However I would have expected reading a 500*360 matrix out of a 20 Mb file would be faster without downloading the full file. Am I doing something wrong? Or should the s3 option not be used to speed up code? Or might this relate to the specific bucket?

require(rhdf5)
#> Loading required package: rhdf5
url<-"https://s3-eu-west-1.amazonaws.com/fmi-opendata-radar-volume-hdf5/2024/03/03/filuo/202403030100_filuo_PVOL.h5"
system.time({download.file(url, t<-tempfile(fileext = "h5"))
download<-h5read(file=t, name = "/dataset1/data1")
})
#>    user  system elapsed 
#>   0.181   0.135   2.086
system.time(direct<-h5read(file=url, s3=T, name = "/dataset1/data1"))
#>    user  system elapsed 
#>   0.693   0.108  24.391
all.equal(direct, download)
#> [1] TRUE
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Amsterdam
#>  date     2024-11-06
#>  pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version date (UTC) lib source
#>  cli            3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
#>  digest         0.6.35  2024-03-11 [1] CRAN (R 4.4.0)
#>  evaluate       0.23    2023-11-01 [1] CRAN (R 4.4.0)
#>  fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fs             1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
#>  glue           1.8.0   2024-09-30 [1] CRAN (R 4.4.0)
#>  htmltools      0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  knitr          1.46    2024-04-06 [1] CRAN (R 4.4.0)
#>  lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  purrr          1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
#>  R.cache        0.16.0  2022-07-21 [1] CRAN (R 4.4.0)
#>  R.methodsS3    1.8.2   2022-06-13 [1] CRAN (R 4.4.0)
#>  R.oo           1.26.0  2024-01-24 [1] CRAN (R 4.4.0)
#>  R.utils        2.12.3  2023-11-18 [1] CRAN (R 4.4.0)
#>  reprex         2.1.0   2024-01-11 [1] CRAN (R 4.4.0)
#>  rhdf5        * 2.48.0  2024-04-30 [1] Bioconduc~
#>  rhdf5filters   1.16.0  2024-04-30 [1] Bioconduc~
#>  Rhdf5lib       1.26.0  2024-04-30 [1] Bioconduc~
#>  rlang          1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown      2.26    2024-03-05 [1] CRAN (R 4.4.0)
#>  rstudioapi     0.17.1  2024-10-22 [1] CRAN (R 4.4.0)
#>  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  styler         1.10.3  2024-04-07 [1] CRAN (R 4.4.0)
#>  vctrs          0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  withr          3.0.2   2024-10-28 [1] CRAN (R 4.4.0)
#>  xfun           0.43    2024-03-25 [1] CRAN (R 4.4.0)
#>  yaml           2.3.9   2024-07-05 [1] CRAN (R 4.4.0)
#> 
#>  [1] /home/bart/R/x86_64-pc-linux-gnu-library/4.4
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-11-06 with reprex v2.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions