diff --git a/docs/apps/r-env.html b/docs/apps/r-env.html new file mode 100644 index 0000000000..889e92f3dd --- /dev/null +++ b/docs/apps/r-env.html @@ -0,0 +1,938 @@ + + + + +
+ + + + + + + + +r-env provides R and RStudio server, and several other
+features to facilitate their use. It runs in an Apptainer
+container.
R is an open-source language and environment for statistical +computing and graphics. More information on R can be found on the R Project website. +Many useful R manuals +are also hosted on CRAN.
RStudio Server is an integrated development environment (IDE) for +R. More information on RStudio can be found on the RStudio website.
!!! info “News” 17.2.2026 R version 4.5.2 is now
+available in r-env in Puhti and Mahti and is set as the
+default version.
**22.7.2025** R version 4.5.1 is now available in `r-env` in Puhti and Mahti and is set as the default version.
+??? info “Older news (click to show)”
+7.4.2025 r-env is now also available on
+Mahti, including RStudio in the Mahti web interface. The
+module works in general similarly as r-env on Puhti, but
+please note that the documentation below has not yet been updated for
+Mahti. The new
+small partition on Mahti is suitable for many types of R and RStudio
+work, excluding the most memory intensive tasks. Users familiar with
+Puhti should note that on Mahti there is no separate memory reservation,
+and the only way to get more memory is to reserve more cores. If you
+have any questions on using R on Mahti, please contact CSC Service Desk.
r-env includes 1500+ pre-installed R packages, including
+support for geospatial analyses and
+parallel computing. For improved performance, r-env has
+been compiled using the Intel®
+oneAPI Math Kernel Library (oneMKL) (formerly Intel® MKL).
With a small number of exceptions, R package versions on
+r-env are date-locked (CRAN
+packages) or fixed to a specific Bioconductor version.
Current modules and versions supported on Puhti, Mahti and Roihu:
+=== “Puhti” | Module name (R version) | CRAN package dating |
+Bioconductor version | RStudio Server version | oneMKL version | Cmdstan
+version | |:———————–:|:——————–|:——————–:|:———————-:|:————–:|:—————:| |
+r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 |
+2.38.0 | | r-env/451 | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 |
+2.36.0 | | r-env/442 | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 |
+2.36.0 | | r-env/440 | May 15 2024 | 3.19 | 2024.04.0-735 | 2024.1.0 |
+2.35.0 |
+| r-env/432 | Jan 15 2024 | 3.18 | 2023.12.0-369 | 2024.0.0 | 2.34.1 | |
+r-env/430 | Jun 07 2023 | 3.17 | 2023.06.0-421 | 2023.1.0 | 2.32.2
+|
+| r-env/422 | Mar 06 2023 | 3.16 | 2023.03.0-386 | 2023.1.0 | 2.32.1 | |
+r-env/421 | Jun 29 2022 | 3.15 | 2022.02.3-492 | 2022.1.0 | 2.30.1 |
=== “Mahti” | Module name (R version) | CRAN package dating | +Bioconductor version | RStudio Server version | oneMKL version | Cmdstan +version | |:———————–:|:——————–|:——————–:|:———————-:|:————–:|:—————:| | +r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | +2.38.0 | | r-env/451 | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 | +2.36.0 | | r-env/442 | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 | +2.36.0 |
+=== “Roihu” | Module name (R version) | CRAN package dating | +Bioconductor version | RStudio Server version | oneMKL version | Cmdstan +version | |:———————–:|:——————–|:——————–:|:———————-:|:————–:|:—————:| | +r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | +2.38.0 |
+Other software and libraries:
+!!! info “New users”
+Add instructions here on how to get started, or link to a tutorial
Information on licenses that are in use for R and associated
+software (including packages) can be found on the R Project website. The
+exact license of a package can also be checked inside R:
+packageDescription("package", fields="License"). More
+information on citing R and different R packages
+(at the bottom of the page).
The RStudio Server installation is based on the Open Source +Edition (available under the AGPL v3 +license). Please consult also the RStudio End User License +Agreement.
Open MPI is distributed under the 3-clause BSD +license (details on the Open MPI +website).
Mellanox OFED™ is based on OFED™ (available under a dual license +of BSD or GPL 2.0), as well as proprietary components (see the Mellanox OFED™ +End-User Agreement).
Intel® MKL is distributed under the Intel +Simplified Software License.
NVIDIA NCCL is distributed under the 3-clause +BSD license.
NVIDIA cuDNN is distributed under the Software +License Agreement for NVIDIA software development kits.
cget is available under the Boost +Software License.
CmdStan is distributed under the 3-clause +BSD license.
Licensing information within the r-env container is
+available in the file /usr/licensing.txt.
There are several ways to use R and the r-env module:
+[VAIHDA task -> recommended options?]
RStudio Server, which runs in interactive jobs on a +compute node. Use this option for preparing your code and for +smaller analyses. Interactive jobs may use limited resources.
R console in the command line in interactive jobs on a +compute node. Use this option for preparing your code and for +smaller analyses. Interactive jobs may use limited resources.
Non-interactive batch jobs without limits on the reserved +computing resources (other than those applying on the specific CSC’s +supercomputer in general). Use this option for analyses that take long +or require a lot of memory or cores.
On the login node, using the R console. Use this option only for +moving data, checking package availability and installing packages. +Puhti login nodes are not intended for heavy +computing.
Using RStudio Server
+Ther-env module can be used to remotely launch RStudio
+Server on your web browser.
The recommended way to launch RStudio is to use the +Puhti or Mahti web interface. For details, +see the Puhti web interface +documentation and documentation for the interactive RStudio +app.
+It is also possible to launch RStudio via SSH tunnelling.This option +requires authentication using a Secure Shell (SSH) key. Detailed +instructions for this are provided in a separate +tutorial for using RStudio Server and our documentation on setting up +SSH keys on Windows, macOS and Linux.
+!!! note RStudio Server is meant for interactive work that consumes a +modest amount of computational resources. Long, memory-intensive, or +otherwise resource-heavy tasks are best carried out as non-interactive +batch jobs.
+Using R console in an interactive shell +session
+To use R interactively from the command line on a compute node, first +start an interactive +shell session:
+Option 1. In the Puhti or Mahti web interface, using the shell +application. Under Tools or on the front page, select +Compute node shell. Select the resources, making sure to +reserve local disk space for temporary files, and launch the +session.
+Option 2. When connecting to the supercomputer with an SSH
+client on your own workstation, open a shell session on the
+interactive partition using the sinteractive
+command. As an example, the command below would launch a
+session with 4 GB of memory and 10 GB of local disk space for temporary
+files. Local disk space should always be reserved when using R
+interactively.
=== “Puhti”
+bash sinteractive --account <project> --mem 4000 --tmp 10
=== “Mahti”
+bash # note that on Mahti, the available memory is determined by the number of cores (1.875 GiB each) sinteractive --account <project> --cores 2 --tmp 10
=== “Roihu”
+bash sinteractive --account <project> --mem 4000 --tmp 10
It is possible to specify other options including the running time
+(see the
+sinteractive documentation).
Once you have opened an interactive shell session, you can +launch a command line version of R as follows (note +that the command needs to be run on a compute node):
+module load r-env
+start-r
+Interactive use on a login node
+It is also possible to use the R console on the login node for light +tasks. Use this option only for moving data, checking package +availability and installing packages. Puhti login nodes are not intended for heavy +computing.
+To launch the R console on a login node, run the following +commands:
+module load r-env
+apptainer_wrapper exec R --no-save
+Further to interactive jobs, R scripts can be run non-interactively +using batch job files. Batch jobs are recommended in particular for long +and resource-heavy tasks. In addition to the following examples, see the Puhti +batch job documentation for more information. If you are new to +batch jobs, check the materials of the CSC +Computing Environment on batch jobs. Batch job files are submitted +to the batch job system on a login node as follows:
+sbatch batch_job_file.sh
+Below is an example for submitting a serial single-processor R batch
+job. Note that the test partition is used, which has a time
+limit of 15 minutes and is used for testing purposes only. Actual R
+batch jobs should in most cases be run in the small
+partition.
!!! note For batch jobs, make sure to define a project-specific
+temporary directory in /scratch/<project> or on the
+fast local disk.
We execute the R script using the apptainer_wrapper
+command, which makes sure project directories are visible in the
+Apptainer container that r-env runs in.
=== “Puhti” ``` bash #!/bin/bash -l #SBATCH –job-name=r_serial # Job
+name #SBATCH –account=
# Load the r-env module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+ sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify a temporary directory path
+echo "TMPDIR=/scratch/<project>" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+=== “Mahti” ``` bash #!/bin/bash -l #SBATCH –job-name=r_serial # Job
+name #SBATCH –account=
# Load the r-env module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+ sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify a temporary directory path
+echo "TMPDIR=/scratch/<project>" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+=== “Roihu” ``` bash #!/bin/bash -l #SBATCH –job-name=r_serial # Job
+name #SBATCH –account=
# Load the r-env module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+ sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify a temporary directory path
+echo "TMPDIR=/scratch/<project>" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+In the above example, one task (--ntasks=1) is executed
+with 1 CPU core (--cpus-per-task=1), 1 GB of memory
+(--mem-per-cpu=1000) and a run time of five minutes
+(--time=00:05:00) reserved for the job.
The command module load r-env loads the latest
+r-env version available. To specify which module version is
+loaded, use module load r-env/<version>, for example
+module load r-env/440.
!!! info By default, R uses one CPU core. When you are working with R +script or packages that can take advantage of multiple processors and +parallel processing, take a look at the examples for parallel R batch jobs.
+It is possible to check if a particular package is already installed +as follows.
+# One way is to try loading the package:
+library(packagename)
+
+# If you don't want to load the package, it is also
+# possible to search through a list:
+installed_packages <- library()$results[,1]
+"packagename" %in% installed_packages
+
+# Note: both ways are sensitive to upper- and lower-case letters
+Additional R package installations can be arranged via two +routes:
+Project-specific installations can be used by creating a separate
+package directory in the /projappl/<project>
+directory (instructions below; also see here for information
+on ProjAppl)
Requests for general installations (provided to all users as part +of the module): please contact CSC +Service Desk
To make use of a project-specific package library, follow these
+instructions. First create a new folder inside your project directory.
+Note that the folder should be specific to the R version you are using
+(R packages installed using different r-env modules are not
+cross-compatible).
# On the command prompt:
+# First navigate to /projappl/<project>, then
+mkdir project_rpackages_<rversion>
+You can then add the folder to your library trees in R:
+# Add this to your R code:
+.libPaths(c("/projappl/<project>/project_rpackages_<rversion>", .libPaths()))
+libpath <- .libPaths()[1]
+
+# This command can be used to check that the folder is now visible:
+.libPaths() # It should be first on the list
+
+# Package installations should now be directed to the project
+# folder by default. You can also specify the path, e.g. install.packages("package", lib = libpath)
+
+# Note that it's also possible to fetch the R version automatically using getRversion(). For example:
+.libPaths(paste0("/projappl/<project>/project_rpackages_", gsub("\\.", "", getRversion())))
+To use R packages installed in /projappl, add the
+following to the beginning of your R script. This modifies your library
+trees within a given R session only. In other words, you will need to
+run this each time when launching R:
.libPaths(c("/projappl/<project>/project_rpackages_<rversion>", .libPaths()))
+Alternatively, you can add the desired changes to an
+.Renviron file (only when not using RStudio):
echo "R_LIBS=/projappl/<project>/project_rpackages_<rversion>" >> ~/.Renviron
+!!! note When using r-env, user-defined changes to R
+library paths must be specified inside an R session or in relation to an
+.Renviron file. Other changes (e.g. using
+export to modify environment variables) will not work due
+to the R installation running inside an Apptainer container. If your
+analysis would require changes that cannot be achieved through the above
+means, please contact us for a module-wide package installation.
For jobs that read and write large numbers of files (I/O-intensive +analyses), fast +local storage can be used in non-interactive batch jobs with minor +changes to the batch job file. Interactive R jobs use fast local storage +by default.
+An example of a serial batch job using 10 GB of fast local storage
+(--gres=nvme:10) is given below. Here a temporary directory
+is specified using the environment variable TMPDIR, in
+contrast to the prior examples where it was set as
+/scratch/<project>.
=== “Puhti” ``` bash #!/bin/bash -l #SBATCH
+–job-name=r_serial_fastlocal #SBATCH –account=
# Load the module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+ sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify NVMe temp folder path
+echo "TMPDIR=$TMPDIR" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+=== “Mahti” ``` bash #!/bin/bash -l #SBATCH
+–job-name=r_serial_fastlocal #SBATCH –account=
# Load the module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+ sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify NVMe temp folder path
+echo "TMPDIR=$TMPDIR" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+=== “Roihu” ``` bash #!/bin/bash -l #SBATCH
+–job-name=r_serial_fastlocal #SBATCH –account=
# Load the module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+ sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify NVMe temp folder path
+echo "TMPDIR=$TMPDIR" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+Further to temporary file storage, data sets for analysis can be
+stored on a fast local drive in the location specified by the variable
+LOCAL_SCRATCH. To enable R to find your data, you will need
+to indicate this location in your R script. After launching R, you can
+print out the location using the following command:
Sys.getenv("LOCAL_SCRATCH")
+r-env with StanThe r-env module includes several packages that make use
+of Stan for statistical
+modelling.
!!! note The thread affinity variable
+APPTAINERENV_OMP_PLACES=cores has been found to interfere
+with parallel jobs using the rstan package. We currently
+recommend that this variable should not be used for parallel R jobs with
+Stan.
Using R with the CmdStan backend
+The r-env module comes with a separate CmdStan installation that
+is specific to each module version. To use it, one must set the correct
+path to CmdStan using cmdstanr. For example, for
+r-env/452 this would be done as follows:
cmdstanr::set_cmdstan_path("/appl/soft/math/r-env/452-stan/cmdstan-2.38.0")
+If you are using CmdStan in an interactive session, the above command +will work directly. For non-interactive batch jobs, the path to CmdStan +needs to be separately set in the batch job file. This is done by +including the following commands further to your other batch job file +contents:
+# Set R version
+export RVER=452
+
+# Launch R after binding CmdStan
+SING_FLAGS="$SING_FLAGS -B /appl/soft/math/r-env/${RVER}-stan:/appl/soft/math/r-env/${RVER}-stan"
+srun apptainer_wrapper exec Rscript --no-save script.R
+Other details on using the CmdStan backend are package-specific. As
+one example, one could use it with the brms
+package:
library(brms)
+
+fit_serial <- brm(
+ count ~ zAge + zBase * Trt + (1|patient),
+ data = epilepsy, family = poisson(),
+ chains = 4, cores = 4, backend = "cmdstanr"
+)
+The most common profiling tools in R are Rprof and profvis.
+old links, find newer ones?: https://support.posit.co/hc/en-us/articles/218221837-Profiling-R-code-with-the-RStudio-IDE +https://www.r-bloggers.com/2013/09/profiling-r-code/
+When trying to speed up an R job, use these tools to see which parts +of your script are the slowest. Look for possibilities to make the +slowest parts faster. Also functions from different packages might use +different amounts of time for a similar computational task.In addition: +- Watch out for ‘for loops’ which grow an object step by step and try to +find alternative ways. - Make the script run in parallel. See separate +page.
+If pdf rendering of an R Markdown or a Quarto document fails, run the +following in R:
+tinytex::install_tinytex()
+When prompted about an existing LaTeX distribution, answer
+yes to continue the installation anyway.
The r-env module comes with the aws.s3
+package for working with S3 storage, which makes it possible to use the
+Allas storage system directly from an R script. See here
+for a practical example involving raster data.
Accessing Allas via the r-env module can be done as
+follows. First configure Allas by running these commands before
+launching an interactive shell session:
module load allas
+allas-conf --mode s3cmd
+After starting an
+interactive session and launching R / RStudio Server, you can now
+access your bucket list as follows. Note that, for this to work, you
+will need to have the allas module loaded and the argument
+region='' added to the bucketlist()
+function:
library(aws.s3)
+bucketlist(region='')
+For finding out the correct citations for R and different R packages, +you can type:
+citation() # for citing R
+citation("package") # for citing R packages
+Parallel R guide
r-env +container recipes (link to public GitHub repository)
R FAQs (hosted +by CRAN)
Related +Projects (list of R-related projects on R Project website)
R package +cheatsheets (hosted on RStudio website)
tidyverse (pre-installed
+on the r-env module)