diff --git a/docs/apps/r-env.html b/docs/apps/r-env.html new file mode 100644 index 0000000000..889e92f3dd --- /dev/null +++ b/docs/apps/r-env.html @@ -0,0 +1,938 @@ + + + + + + + + + + + + + +r-env + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +
+

r-env

+

r-env provides R and RStudio server, and several other +features to facilitate their use. It runs in an Apptainer +container.

+ +

!!! info “News” 17.2.2026 R version 4.5.2 is now +available in r-env in Puhti and Mahti and is set as the +default version.

+
**22.7.2025** R version 4.5.1 is now available in `r-env` in Puhti and Mahti and is set as the default version.  
+

??? info “Older news (click to show)”
+7.4.2025 r-env is now also available on +Mahti, including RStudio in the Mahti web interface. The +module works in general similarly as r-env on Puhti, but +please note that the documentation below has not yet been updated for +Mahti. The new +small partition on Mahti is suitable for many types of R and RStudio +work, excluding the most memory intensive tasks. Users familiar with +Puhti should note that on Mahti there is no separate memory reservation, +and the only way to get more memory is to reserve more cores. If you +have any questions on using R on Mahti, please contact CSC Service Desk.

+
+

Available

+

r-env includes 1500+ pre-installed R packages, including +support for geospatial analyses and +parallel computing. For improved performance, r-env has +been compiled using the Intel® +oneAPI Math Kernel Library (oneMKL) (formerly Intel® MKL).

+

With a small number of exceptions, R package versions on +r-env are date-locked (CRAN +packages) or fixed to a specific Bioconductor version.

+

Current modules and versions supported on Puhti, Mahti and Roihu:

+

=== “Puhti” | Module name (R version) | CRAN package dating | +Bioconductor version | RStudio Server version | oneMKL version | Cmdstan +version | |:———————–:|:——————–|:——————–:|:———————-:|:————–:|:—————:| | +r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | +2.38.0 | | r-env/451 | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 | +2.36.0 | | r-env/442 | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 | +2.36.0 | | r-env/440 | May 15 2024 | 3.19 | 2024.04.0-735 | 2024.1.0 | +2.35.0 |
+| r-env/432 | Jan 15 2024 | 3.18 | 2023.12.0-369 | 2024.0.0 | 2.34.1 | | +r-env/430 | Jun 07 2023 | 3.17 | 2023.06.0-421 | 2023.1.0 | 2.32.2 +|
+| r-env/422 | Mar 06 2023 | 3.16 | 2023.03.0-386 | 2023.1.0 | 2.32.1 | | +r-env/421 | Jun 29 2022 | 3.15 | 2022.02.3-492 | 2022.1.0 | 2.30.1 |

+

=== “Mahti” | Module name (R version) | CRAN package dating | +Bioconductor version | RStudio Server version | oneMKL version | Cmdstan +version | |:———————–:|:——————–|:——————–:|:———————-:|:————–:|:—————:| | +r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | +2.38.0 | | r-env/451 | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 | +2.36.0 | | r-env/442 | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 | +2.36.0 |

+

=== “Roihu” | Module name (R version) | CRAN package dating | +Bioconductor version | RStudio Server version | oneMKL version | Cmdstan +version | |:———————–:|:——————–|:——————–:|:———————-:|:————–:|:—————:| | +r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | +2.38.0 |

+

Other software and libraries:

+
    +
  • Open MPI (with Mellanox OFED™ software) 4.1.7 (r-env/451, r-env/452) +, 4.1.2 (from r-env/421 to r-env 442)
  • +
  • TensorFlow 2.20.0 (r-env(452), 2.19.0 (r-env/451), 2.18.0 +(r-env/442), 2.9.1 (from r-env/421 to r-env/440)
  • +
  • cget 0.2.0
  • +
+

!!! info “New users”
+Add instructions here on how to get started, or link to a tutorial

+
+
+

Licenses

+ +

Licensing information within the r-env container is +available in the file /usr/licensing.txt.

+
+
+

Usage

+

There are several ways to use R and the r-env module: +[VAIHDA task -> recommended options?]

+
    +
  • RStudio Server, which runs in interactive jobs on a +compute node. Use this option for preparing your code and for +smaller analyses. Interactive jobs may use limited resources.

  • +
  • R console in the command line in interactive jobs on a +compute node. Use this option for preparing your code and for +smaller analyses. Interactive jobs may use limited resources.

  • +
  • Non-interactive batch jobs without limits on the reserved +computing resources (other than those applying on the specific CSC’s +supercomputer in general). Use this option for analyses that take long +or require a lot of memory or cores.

  • +
  • On the login node, using the R console. Use this option only for +moving data, checking package availability and installing packages. +Puhti login nodes are not intended for heavy +computing.

  • +
+
+

Interactive use on a compute node

+

Using RStudio Server

+

Ther-env module can be used to remotely launch RStudio +Server on your web browser.

+

The recommended way to launch RStudio is to use the +Puhti or Mahti web interface. For details, +see the Puhti web interface +documentation and documentation for the interactive RStudio +app.

+

It is also possible to launch RStudio via SSH tunnelling.This option +requires authentication using a Secure Shell (SSH) key. Detailed +instructions for this are provided in a separate +tutorial for using RStudio Server and our documentation on setting up +SSH keys on Windows, macOS and Linux.

+

!!! note RStudio Server is meant for interactive work that consumes a +modest amount of computational resources. Long, memory-intensive, or +otherwise resource-heavy tasks are best carried out as non-interactive +batch jobs.

+

Using R console in an interactive shell +session

+

To use R interactively from the command line on a compute node, first +start an interactive +shell session:

+

Option 1. In the Puhti or Mahti web interface, using the shell +application. Under Tools or on the front page, select +Compute node shell. Select the resources, making sure to +reserve local disk space for temporary files, and launch the +session.

+

Option 2. When connecting to the supercomputer with an SSH +client on your own workstation, open a shell session on the +interactive partition using the sinteractive +command. As an example, the command below would launch a +session with 4 GB of memory and 10 GB of local disk space for temporary +files. Local disk space should always be reserved when using R +interactively.

+

=== “Puhti” +bash sinteractive --account <project> --mem 4000 --tmp 10

+

=== “Mahti” +bash # note that on Mahti, the available memory is determined by the number of cores (1.875 GiB each) sinteractive --account <project> --cores 2 --tmp 10

+

=== “Roihu” +bash sinteractive --account <project> --mem 4000 --tmp 10

+

It is possible to specify other options including the running time +(see the +sinteractive documentation).

+

Once you have opened an interactive shell session, you can +launch a command line version of R as follows (note +that the command needs to be run on a compute node):

+
module load r-env
+start-r
+

Interactive use on a login node

+

It is also possible to use the R console on the login node for light +tasks. Use this option only for moving data, checking package +availability and installing packages. Puhti login nodes are not intended for heavy +computing.

+

To launch the R console on a login node, run the following +commands:

+
module load r-env
+apptainer_wrapper exec R --no-save
+
+
+

Non-interactive batch jobs

+

Further to interactive jobs, R scripts can be run non-interactively +using batch job files. Batch jobs are recommended in particular for long +and resource-heavy tasks. In addition to the following examples, see the Puhti +batch job documentation for more information. If you are new to +batch jobs, check the materials of the CSC +Computing Environment on batch jobs. Batch job files are submitted +to the batch job system on a login node as follows:

+
sbatch batch_job_file.sh
+
+

Basic R batch job script

+

Below is an example for submitting a serial single-processor R batch +job. Note that the test partition is used, which has a time +limit of 15 minutes and is used for testing purposes only. Actual R +batch jobs should in most cases be run in the small +partition.

+

!!! note For batch jobs, make sure to define a project-specific +temporary directory in /scratch/<project> or on the +fast local disk.

+

We execute the R script using the apptainer_wrapper +command, which makes sure project directories are visible in the +Apptainer container that r-env runs in.

+

=== “Puhti” ``` bash #!/bin/bash -l #SBATCH –job-name=r_serial # Job +name #SBATCH –account= # Billing project, has to be defined! +#SBATCH –output=output_%j.txt # File for storing output (%j replaced by +job id) #SBATCH –error=errors_%j.txt # File for storing errors (%j +replaced by job id) #SBATCH –partition=test # Job queue (partition), in +general use ‘small’ #SBATCH –time=00:05:00 # Max. duration of the job +#SBATCH –cpus-per-task=1 # Number of cores #SBATCH –ntasks=1 # Number of +tasks (only change this for multinode/MPI jobs) #SBATCH –nodes=1 # +Number of nodes (only change this for multinode/MPI jobs) #SBATCH +–mem-per-cpu=1000 # Memory to reserve per core

+
# Load the r-env module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+    sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify a temporary directory path
+echo "TMPDIR=/scratch/<project>" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+

=== “Mahti” ``` bash #!/bin/bash -l #SBATCH –job-name=r_serial # Job +name #SBATCH –account= # Billing project, has to be defined! +#SBATCH –output=output_%j.txt # File for storing output (%j replaced by +job id) #SBATCH –error=errors_%j.txt # File for storing errors (%j +replaced by job id) #SBATCH –partition=test # Job queue (partition), in +general use ‘small’ #SBATCH –time=00:05:00 # Max. duration of the job +#SBATCH –cpus-per-task=1 # Number of cores (1.8 GB of memory each) +#SBATCH –ntasks=1 # Number of tasks (only change this for multinode/MPI +jobs) #SBATCH –nodes=1 # Number of nodes (only change this for +multinode/MPI jobs)

+
# Load the r-env module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+    sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify a temporary directory path
+echo "TMPDIR=/scratch/<project>" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+

=== “Roihu” ``` bash #!/bin/bash -l #SBATCH –job-name=r_serial # Job +name #SBATCH –account= # Billing project, has to be defined! +#SBATCH –output=output_%j.txt # File for storing output (%j replaced by +job id) #SBATCH –error=errors_%j.txt # File for storing errors (%j +replaced by job id) #SBATCH –partition=test # Job queue (partition), in +general use ‘small’ #SBATCH –time=00:05:00 # Max. duration of the job +#SBATCH –cpus-per-task=1 # Number of cores #SBATCH –ntasks=1 # Number of +tasks (only change this for multinode/MPI jobs) #SBATCH –nodes=1 # +Number of nodes (only change this for multinode/MPI jobs)

+
# Load the r-env module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+    sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify a temporary directory path
+echo "TMPDIR=/scratch/<project>" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+

In the above example, one task (--ntasks=1) is executed +with 1 CPU core (--cpus-per-task=1), 1 GB of memory +(--mem-per-cpu=1000) and a run time of five minutes +(--time=00:05:00) reserved for the job.

+

The command module load r-env loads the latest +r-env version available. To specify which module version is +loaded, use module load r-env/<version>, for example +module load r-env/440.

+

!!! info By default, R uses one CPU core. When you are working with R +script or packages that can take advantage of multiple processors and +parallel processing, take a look at the examples for parallel R batch jobs.

+
+
+
+

R package installations

+

It is possible to check if a particular package is already installed +as follows.

+
# One way is to try loading the package:
+library(packagename)
+
+# If you don't want to load the package, it is also
+# possible to search through a list:
+installed_packages <- library()$results[,1]
+"packagename" %in% installed_packages
+
+# Note: both ways are sensitive to upper- and lower-case letters
+

Additional R package installations can be arranged via two +routes:

+
    +
  • Project-specific installations can be used by creating a separate +package directory in the /projappl/<project> +directory (instructions below; also see here for information +on ProjAppl)

  • +
  • Requests for general installations (provided to all users as part +of the module): please contact CSC +Service Desk

  • +
+

To make use of a project-specific package library, follow these +instructions. First create a new folder inside your project directory. +Note that the folder should be specific to the R version you are using +(R packages installed using different r-env modules are not +cross-compatible).

+
# On the command prompt:
+# First navigate to /projappl/<project>, then
+mkdir project_rpackages_<rversion>
+

You can then add the folder to your library trees in R:

+
# Add this to your R code:
+.libPaths(c("/projappl/<project>/project_rpackages_<rversion>", .libPaths()))
+libpath <- .libPaths()[1]
+
+# This command can be used to check that the folder is now visible:
+.libPaths() # It should be first on the list
+
+# Package installations should now be directed to the project
+# folder by default. You can also specify the path, e.g. install.packages("package", lib = libpath)
+
+# Note that it's also possible to fetch the R version automatically using getRversion(). For example:
+.libPaths(paste0("/projappl/<project>/project_rpackages_", gsub("\\.", "", getRversion()))) 
+

To use R packages installed in /projappl, add the +following to the beginning of your R script. This modifies your library +trees within a given R session only. In other words, you will need to +run this each time when launching R:

+
.libPaths(c("/projappl/<project>/project_rpackages_<rversion>", .libPaths()))
+

Alternatively, you can add the desired changes to an +.Renviron file (only when not using RStudio):

+
echo "R_LIBS=/projappl/<project>/project_rpackages_<rversion>" >> ~/.Renviron
+

!!! note When using r-env, user-defined changes to R +library paths must be specified inside an R session or in relation to an +.Renviron file. Other changes (e.g. using +export to modify environment variables) will not work due +to the R installation running inside an Apptainer container. If your +analysis would require changes that cannot be achieved through the above +means, please contact us for a module-wide package installation.

+
+
+

Using fast local storage

+

For jobs that read and write large numbers of files (I/O-intensive +analyses), fast +local storage can be used in non-interactive batch jobs with minor +changes to the batch job file. Interactive R jobs use fast local storage +by default.

+

An example of a serial batch job using 10 GB of fast local storage +(--gres=nvme:10) is given below. Here a temporary directory +is specified using the environment variable TMPDIR, in +contrast to the prior examples where it was set as +/scratch/<project>.

+

=== “Puhti” ``` bash #!/bin/bash -l #SBATCH +–job-name=r_serial_fastlocal #SBATCH –account= #SBATCH +–output=output_%j.txt #SBATCH –error=errors_%j.txt #SBATCH +–partition=test #SBATCH –time=00:05:00 #SBATCH –ntasks=1 #SBATCH +–nodes=1 #SBATCH –mem-per-cpu=1000 #SBATCH –gres=nvme:10

+
# Load the module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+    sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify NVMe temp folder path
+echo "TMPDIR=$TMPDIR" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+

=== “Mahti” ``` bash #!/bin/bash -l #SBATCH +–job-name=r_serial_fastlocal #SBATCH –account= #SBATCH +–output=output_%j.txt #SBATCH –error=errors_%j.txt #SBATCH +–partition=test #SBATCH –time=00:05:00 #SBATCH –ntasks=1 #SBATCH +–nodes=1 #SBATCH –cpus-per-task=1 #SBATCH –gres=nvme:10

+
# Load the module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+    sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify NVMe temp folder path
+echo "TMPDIR=$TMPDIR" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+

=== “Roihu” ``` bash #!/bin/bash -l #SBATCH +–job-name=r_serial_fastlocal #SBATCH –account= #SBATCH +–output=output_%j.txt #SBATCH –error=errors_%j.txt #SBATCH +–partition=test #SBATCH –time=00:05:00 #SBATCH –ntasks=1 #SBATCH +–nodes=1 #SBATCH –cpus-per-task=1 #SBATCH –gres=nvme:10

+
# Load the module
+module load r-env
+
+# Clean up .Renviron file in home directory
+if test -f ~/.Renviron; then
+    sed -i '/TMPDIR/d' ~/.Renviron
+fi
+
+# Specify NVMe temp folder path
+echo "TMPDIR=$TMPDIR" >> ~/.Renviron
+
+# Run the R script
+srun apptainer_wrapper exec Rscript --no-save myscript.R
+```
+

Further to temporary file storage, data sets for analysis can be +stored on a fast local drive in the location specified by the variable +LOCAL_SCRATCH. To enable R to find your data, you will need +to indicate this location in your R script. After launching R, you can +print out the location using the following command:

+
Sys.getenv("LOCAL_SCRATCH")
+
+

Using r-env with Stan

+

The r-env module includes several packages that make use +of Stan for statistical +modelling.

+

!!! note The thread affinity variable +APPTAINERENV_OMP_PLACES=cores has been found to interfere +with parallel jobs using the rstan package. We currently +recommend that this variable should not be used for parallel R jobs with +Stan.

+

Using R with the CmdStan backend

+

The r-env module comes with a separate CmdStan installation that +is specific to each module version. To use it, one must set the correct +path to CmdStan using cmdstanr. For example, for +r-env/452 this would be done as follows:

+
cmdstanr::set_cmdstan_path("/appl/soft/math/r-env/452-stan/cmdstan-2.38.0")
+

If you are using CmdStan in an interactive session, the above command +will work directly. For non-interactive batch jobs, the path to CmdStan +needs to be separately set in the batch job file. This is done by +including the following commands further to your other batch job file +contents:

+
# Set R version
+export RVER=452
+
+# Launch R after binding CmdStan
+SING_FLAGS="$SING_FLAGS -B /appl/soft/math/r-env/${RVER}-stan:/appl/soft/math/r-env/${RVER}-stan"
+srun apptainer_wrapper exec Rscript --no-save script.R
+

Other details on using the CmdStan backend are package-specific. As +one example, one could use it with the brms +package:

+
library(brms)
+
+fit_serial <- brm(
+  count ~ zAge + zBase * Trt + (1|patient),
+  data = epilepsy, family = poisson(),
+  chains = 4, cores = 4, backend = "cmdstanr"
+)
+
+
+
+

Profiling tools in R

+

The most common profiling tools in R are Rprof and profvis.

+

old links, find newer ones?: https://support.posit.co/hc/en-us/articles/218221837-Profiling-R-code-with-the-RStudio-IDE +https://www.r-bloggers.com/2013/09/profiling-r-code/

+

When trying to speed up an R job, use these tools to see which parts +of your script are the slowest. Look for possibilities to make the +slowest parts faster. Also functions from different packages might use +different amounts of time for a similar computational task.In addition: +- Watch out for ‘for loops’ which grow an object step by step and try to +find alternative ways. - Make the script run in parallel. See separate +page.

+
+
+

Pdf rendering

+

If pdf rendering of an R Markdown or a Quarto document fails, run the +following in R:

+
tinytex::install_tinytex()
+

When prompted about an existing LaTeX distribution, answer +yes to continue the installation anyway.

+
+
+

Working with Allas

+

The r-env module comes with the aws.s3 +package for working with S3 storage, which makes it possible to use the +Allas storage system directly from an R script. See here +for a practical example involving raster data.

+

Accessing Allas via the r-env module can be done as +follows. First configure Allas by running these commands before +launching an interactive shell session:

+
module load allas
+allas-conf --mode s3cmd
+

After starting an +interactive session and launching R / RStudio Server, you can now +access your bucket list as follows. Note that, for this to work, you +will need to have the allas module loaded and the argument +region='' added to the bucketlist() +function:

+
library(aws.s3)
+bucketlist(region='')
+
+
+
+

Citation

+

For finding out the correct citations for R and different R packages, +you can type:

+
citation() # for citing R
+citation("package") # for citing R packages
+
+
+

Further information

+ +
+
+ + + + +
+ + + + + + + + + + + + + + + diff --git a/docs/apps/r-env.md b/docs/apps/r-env.md index 29d4f69922..23ddd42fee 100644 --- a/docs/apps/r-env.md +++ b/docs/apps/r-env.md @@ -3,7 +3,7 @@ tags: - Other catalog: name: r-env - description: R, RStudio Server, SAGA and TensorFlow + description: R and RStudio Server license_type: Other disciplines: - Mathematics and Statistics @@ -14,17 +14,18 @@ catalog: # r-env -`r-env` is an [Apptainer container](../computing/containers/overview.md#running-containers) including R and RStudio Server, and several other features to facilitate their use. +`r-env` provides R and RStudio server, and several other features to facilitate their use. It runs in an [Apptainer container](../computing/containers/overview.md#running-containers). -- R is an open-source language and environment for statistical computing and graphics. More information on R can be found on [the R Project website](https://www.r-project.org/about.html). Many useful [R manuals are also hosted on CRAN](https://cran.r-project.org/manuals.html). +- R is an open-source language and environment for statistical computing and graphics. More information on R can be found on [the R Project website](https://www.r-project.org/about.html). Many useful [R manuals are also hosted on CRAN](https://cran.r-project.org/manuals.html). -- RStudio Server is an integrated development environment (IDE) for R. More information on RStudio can be found on the [RStudio website](https://rstudio.com/). +- RStudio Server is an integrated development environment (IDE) for R. More information on RStudio can be found on the [RStudio website](https://rstudio.com/). !!! info "News" - **17.2.2026** R version 4.5.2 is now available in `r-env` in Puhti and Mahti and is set as the default version. + **17.2.2026** R version 4.5.2 is now available in `r-env` in Puhti and Mahti and is set as the default version. - **22.7.2025** R version 4.5.1 is now available in `r-env` in Puhti and Mahti and is set as the default version. + **22.7.2025** R version 4.5.1 is now available in `r-env` in Puhti and Mahti and is set as the default version. +??? info "Older news (click to show)" **7.4.2025** `r-env` is now also available on Mahti, including RStudio in the [Mahti web interface](../computing/webinterface/index.md). The module works in general similarly as `r-env` on Puhti, but please note that the documentation below has not yet been updated for Mahti. The [new small partition on Mahti](../computing/running/batch-job-partitions.md#mahti-cpu-partitions-with-core-based-allocation) is suitable for many types of R and RStudio work, excluding the most memory intensive tasks. Users familiar with Puhti should note that on Mahti there is no separate memory reservation, and the only way to get more memory is to reserve more cores. If you have any questions on using R on Mahti, please contact [CSC Service Desk](../support/contact.md). ## Available @@ -33,19 +34,32 @@ catalog: With a small number of exceptions, R package versions on `r-env` are date-locked ([CRAN packages](https://cran.r-project.org/web/packages/index.html)) or fixed to a specific [Bioconductor](https://www.bioconductor.org/) version. -Current modules and versions supported on Puhti and Mahti: - -| Module name (R version) | Puhti / Mahti | CRAN package dating | Bioconductor version | RStudio Server version | oneMKL version | CmdStan version | -| ----------------------- | ------------- | ------------------- | -------------------- | ---------------------- | ----------------| --------------- | -| r-env/452 | X / X | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | 2.38.0 | -| r-env/451 | X / X | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 | 2.36.0 | -| r-env/442 | X / X | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 | 2.36.0 | -| r-env/440 | X / - | May 15 2024 | 3.19 | 2024.04.0-735 | 2024.1.0 | 2.35.0 | -| r-env/432 | X / - | Jan 15 2024 | 3.18 | 2023.12.0-369 | 2024.0.0 | 2.34.1 | -| r-env/430 | X / - | Jun 07 2023 | 3.17 | 2023.06.0-421 | 2023.1.0 | 2.32.2 | -| r-env/422 | X / - | Mar 06 2023 | 3.16 | 2023.03.0-386 | 2023.1.0 | 2.32.1 | -| r-env/421 | X / - | Jun 29 2022 | 3.15 | 2022.02.3-492 | 2022.1.0 | 2.30.1 | - +Current modules and versions supported on Puhti, Mahti and Roihu: + +=== "Puhti" + | Module name (R version) | CRAN package dating | Bioconductor version | RStudio Server version | oneMKL version | Cmdstan version | + |:-----------------------:|:--------------------|:--------------------:|:----------------------:|:--------------:|:---------------:| + | r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | 2.38.0 | + | r-env/451 | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 | 2.36.0 | + | r-env/442 | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 | 2.36.0 | + | r-env/440 | May 15 2024 | 3.19 | 2024.04.0-735 | 2024.1.0 | 2.35.0 | + | r-env/432 | Jan 15 2024 | 3.18 | 2023.12.0-369 | 2024.0.0 | 2.34.1 | + | r-env/430 | Jun 07 2023 | 3.17 | 2023.06.0-421 | 2023.1.0 | 2.32.2 | + | r-env/422 | Mar 06 2023 | 3.16 | 2023.03.0-386 | 2023.1.0 | 2.32.1 | + | r-env/421 | Jun 29 2022 | 3.15 | 2022.02.3-492 | 2022.1.0 | 2.30.1 | + +=== "Mahti" + | Module name (R version) | CRAN package dating | Bioconductor version | RStudio Server version | oneMKL version | Cmdstan version | + |:-----------------------:|:--------------------|:--------------------:|:----------------------:|:--------------:|:---------------:| + | r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | 2.38.0 | + | r-env/451 | July 7 2025 | 3.21 | 2025.05.1-513 | 2025.2.0 | 2.36.0 | + | r-env/442 | Feb 12 2025 | 3.20 | 2024.12.0-467 | 2025.0.1 | 2.36.0 | + +=== "Roihu" + | Module name (R version) | CRAN package dating | Bioconductor version | RStudio Server version | oneMKL version | Cmdstan version | + |:-----------------------:|:--------------------|:--------------------:|:----------------------:|:--------------:|:---------------:| + | r-env/452 (default) | Jan 7 2026 | 3.22 | 2026.01.0-392 | 2025.3.0 | 2.38.0 | + Other software and libraries: @@ -53,628 +67,385 @@ Other software and libraries: - TensorFlow 2.20.0 (r-env(452), 2.19.0 (r-env/451), 2.18.0 (r-env/442), 2.9.1 (from r-env/421 to r-env/440) - cget 0.2.0 -## Licenses - -- Information on licenses that are in use for R and associated software (including packages) can be found on the [R Project website](https://www.r-project.org/Licenses/). The exact license of a package can also be checked inside R: `packageDescription("package", fields="License")`. More information on [citing R and different R packages](#citation) (at the bottom of the page). - -- The RStudio Server installation is based on the [Open Source Edition](https://rstudio.com/products/rstudio/#rstudio-desktop) (available under the [AGPL v3 license)](https://github.com/rstudio/rstudio/blob/master/COPYING). Please consult also the [RStudio End User License Agreement](https://rstudio.com/about/eula/). - -- Open MPI is distributed under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause) (details on the [Open MPI website](https://www.open-mpi.org/community/license.php)). - -- Mellanox OFED™ is based on OFED™ (available under a dual license of BSD or GPL 2.0), as well as proprietary components (see the [Mellanox OFED™ End-User Agreement](https://www.mellanox.com/page/mlnx_ofed_eula)). - -- Intel® MKL is distributed under the [Intel Simplified Software License](https://software.intel.com/content/dam/develop/external/us/en/documents/pdf/intel-simplified-software-license.pdf). - -- NVIDIA NCCL is distributed under the [3-clause BSD license](https://docs.nvidia.com/deeplearning/nccl/bsd/index.html). - -- NVIDIA cuDNN is distributed under the [Software License Agreement for NVIDIA software development kits](https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html). - -- cget is available under the [Boost Software License](https://github.com/pfultz2/cget/blob/master/LICENSE). - -- CmdStan is distributed under the [3-clause BSD license](https://github.com/stan-dev/cmdstan/blob/develop/LICENSE). - -Licensing information within the `r-env` container is available in the file `/usr/licensing.txt`. - -## Usage - -There are several ways to use the `r-env` module on Puhti: -* Non-interactive batch jobs without limits on the reserved computing resources (other than those applying to Puhti in general). Use this option for analyses that take longer or require a lot of memory. -* [Interactive jobs on a compute node](../computing/running/interactive-usage.md), using either the R console or RStudio Server. Use this option for preparing your code and for smaller analyses. Interactive jobs may use limited resources. -* Interactively on the login node, using the R console. Use this option only for moving data, checking package availability and installing packages. Puhti login nodes are [not intended for heavy computing](../computing/usage-policy.md#login-nodes). +!!! info "New users" + Add instructions here on how to get started, or link to a tutorial -#### Interactive use on a compute node -***Starting a shell session on the interactive partition*** - -To use R interactively on Puhti compute nodes, open a shell session on the `interactive` partition using the `sinteractive` command. As an example, the command below would launch a session with 4 GB of memory and 10 GB of local scratch space. - -```bash -sinteractive --account --mem 4000 --tmp 10 -``` - -It is also possible to specify other options including the running time ([see the `sinteractive` documentation](../computing/running/interactive-usage.md)). - -***Launching the R console*** - -Once you have opened an interactive shell session, you can start a command line version of R as follows (note that the command needs to be run on a compute node): - -```bash -module load r-env -start-r -``` - -***Using RStudio Server*** - -The`r-env` module can be used to remotely launch RStudio Server on your web browser. For this, you have two options. - -**Option 1. Using the Puhti web interface**. This is by far the easiest way to launch RStudio on Puhti. For details, [see the Puhti web interface documentation](../computing/webinterface/index.md). - -**Option 2. Using SSH tunneling**. This option requires authentication using a Secure Shell (SSH) key. Detailed instructions for this are provided in a [separate tutorial for using RStudio Server](../support/tutorials/rstudio-or-jupyter-notebooks.md) and our [documentation on setting up SSH keys on Windows, macOS and Linux](../computing/connecting/ssh-keys.md). +## Licenses -#### Interactive use on a login node +- Information on licenses that are in use for R and associated software (including packages) can be found on the [R Project website](https://www.r-project.org/Licenses/). The exact license of a package can also be checked inside R: `packageDescription("package", fields="License")`. More information on [citing R and different R packages](#citation) (at the bottom of the page). -To launch the R console on a login node, run the following commands: +- The RStudio Server installation is based on the [Open Source Edition](https://posit.co/products/open-source/rstudio/) (available under the [AGPL v3 license)](https://github.com/rstudio/rstudio/blob/master/COPYING). Please consult also the [RStudio End User License Agreement](https://rstudio.com/about/eula/). -```bash -module load r-env -apptainer_wrapper exec R --no-save +- Open MPI is distributed under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause) (details on the [Open MPI website](https://www.open-mpi.org/community/license.php)). -# Note: this issues a warning mentioning that apptainer_wrapper -# is meant for use on a compute node. However, R will still launch -# as intended. -``` +- Mellanox OFED™ is based on OFED™ (available under a dual license of BSD or GPL 2.0), as well as proprietary components (see the [Mellanox OFED™ End-User Agreement](https://www.mellanox.com/page/mlnx_ofed_eula)). -#### Non-interactive use +- Intel® MKL is distributed under the [Intel Simplified Software License](https://software.intel.com/content/dam/develop/external/us/en/documents/pdf/intel-simplified-software-license.pdf). -Further to interactive jobs, R scripts can be run non-interactively using batch job files. In addition to the following examples, [see this link](../computing/running/creating-job-scripts-puhti.md) for more information. Batch job files can be submitted to the batch job system as follows: +- NVIDIA NCCL is distributed under the [3-clause BSD license](https://docs.nvidia.com/deeplearning/nccl/bsd/index.html). -```bash -sbatch batch_job_file.sh -``` +- NVIDIA cuDNN is distributed under the [Software License Agreement for NVIDIA software development kits](https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html). -#### Serial batch jobs +- cget is available under the [Boost Software License](https://github.com/pfultz2/cget/blob/master/LICENSE). -Below is an example for submitting a single-processor R batch job on Puhti. Note that the `test` partition is used, which has a time limit of 15 minutes and is used for testing purposes only. Most R jobs are best run in the `small` partition. For memory-intensive non-interactive jobs, we should also list a project-specific temporary directory in `/scratch/`. We also execute the job using the `apptainer_wrapper` command. +- CmdStan is distributed under the [3-clause BSD license](https://github.com/stan-dev/cmdstan/blob/develop/LICENSE). -```bash -#!/bin/bash -l -#SBATCH --job-name=r_serial -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=test -#SBATCH --time=00:05:00 -#SBATCH --ntasks=1 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=1000 +Licensing information within the `r-env` container is available in the file `/usr/licensing.txt`. -# Load r-env -module load r-env -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi +## Usage -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron +There are several ways to use R and the `r-env` module: -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R -``` +***Interactive use*** -In the above example, one task (`--ntasks=1`) is executed with 1 GB of memory (`--mem-per-cpu=1000`) and a run time of five minutes (`--time=00:05:00`) reserved for the job. +!!! info "" + **Interactive jobs** are meant for preparing your code and smaller analyses and may use limited resources. -#### Parallel batch jobs +- RStudio Server, which runs in [interactive jobs on a compute node](../computing/running/interactive-usage.md). -The `r-env` module can be used for parallel computing in several ways. These include multi-core and array submissions, as well as MPI (Message Passing Interface)-based jobs. The module comes with several packages that support multi-node communication via MPI: `doMPI` (used with `foreach`), `future`, `pbdMPI` and `snow`. +- R console in the command line in [interactive jobs on a compute node](../computing/running/interactive-usage.md). -Further to the following examples, please see our separate [tutorial for parallel R jobs](../support/tutorials/parallel-r.md). There is also [separate documentation on MPI jobs](../computing/running/creating-job-scripts-puhti.md#mpi-based-batch-jobs). You may also wish to check the relevant R package manuals and [this page](https://github.com/csc-training/geocomputing/tree/master/R/puhti/02_parallel_future) for examples of parallel computing using the `raster` package. +- On the login node, using the R console. Use this option only for moving data, checking package availability and installing packages. Puhti login nodes are [not intended for heavy computing](../computing/usage-policy.md#login-nodes). -!!! note - For jobs employing the Rmpi package, please use snow (which is built on top of Rmpi). Jobs using Rmpi alone are unavailable due to compatibility issues. - -*Multi-core jobs* - -To submit a job employing multiple cores on a single node, one could use the following batch job file. The job reserves a single task (`--ntasks=1`), eight cores (`--cpus-per-task=8`) and a total of 8 GB of memory (`--mem-per-cpu=1000)`. The run time is limited to five minutes. - -```bash -#!/bin/bash -l -#SBATCH --job-name=r_multicore -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=test -#SBATCH --time=00:05:00 -#SBATCH --ntasks=1 -#SBATCH --cpus-per-task=8 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=1000 - -# Load r-env -module load r-env +***Non-interactive use*** -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi +!!! info "" + **Non-interactive batch jobs** should be used for for long or resource-intensive tasks. -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron +- Non-interactive batch jobs without limits on the reserved computing resources (other than those applying on the specific CSC's supercomputer in general). -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R -``` -*Array jobs* +### Interactive use on a compute node -Array jobs can be used to handle [*embarrassingly parallel*](../computing/running/array-jobs.md) tasks. The script below would submit a job involving ten subtasks on the `small` partition, with each requiring less than five minutes of computing time and less than 1 GB of memory. +***Using RStudio Server*** -```bash -#!/bin/bash -l -#SBATCH --job-name=r_array -#SBATCH --account= -#SBATCH --output=output_%j_%a.txt -#SBATCH --error=errors_%j_%a.txt -#SBATCH --partition=small -#SBATCH --time=00:05:00 -#SBATCH --array=1-10 -#SBATCH --ntasks=1 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=1000 +The`r-env` module can be used to remotely launch RStudio Server on your web browser. -# Load r-env -module load r-env +**The recommended way to launch RStudio** is to use the [Puhti or Mahti web interface](../computing/webinterface/index.md). See also the documentation for the [interactive RStudio app](../computing/webinterface/rstudio.md). -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi +It is also possible to launch RStudio via SSH tunnelling.This option requires authentication using a Secure Shell (SSH) key. Detailed instructions for this are provided in a [separate tutorial for using RStudio Server](../support/tutorials/rstudio-or-jupyter-notebooks.md) and our [documentation on setting up SSH keys on Windows, macOS and Linux](../computing/connecting/ssh-keys.md). -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron +!!! note "" + RStudio Server is meant for **interactive work that consumes a modest amount of computational resources**. Long, memory-intensive, or otherwise resource-heavy tasks are best carried out as non-interactive batch jobs. -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R $SLURM_ARRAY_TASK_ID -``` -For larger-scale array jobs involving [many small independent runs](../support/tutorials/many.md), we could consider the following example. Let's assume that we have a total of 1500 runs that we would like to complete. We also have a list (`mylist.txt`) with unique identifiers for each run that we wish to use as part of an R script to retrieve the correct data set for analysis. The list is arranged row-by-row like this: +***Using R console in an interactive shell session*** -```bash -set1 -set2 -set3 -(...) -set1500 -``` +To use R interactively from the command line on a compute node, first start an [interactive shell session](https://csc-training.github.io/csc-env-eff/hands-on/batch_jobs/interactive.html): -To perform our analysis efficiently, we could take advantage of a module including [GNU parallel](https://www.gnu.org/software/parallel/) to "schedule" how the runs are completed within the array job. There are a couple of details we should notice about the batch job script below: +**Option 1. In the [supercomputer web interfaces](../computing/webinterface/index.md), using the shell application**. *Compute node shell*. When selecting the resources, make sure to reserve local disk space for temporary files. -- The way in which the runs are split into arrays is case-specific and requires manual calculation. In the current example, since `mylist.txt` contains 1500 identifiers and we are using 10 arrays, a decision has been made to allocate 150 runs per array. +**Option 2. When connecting to the supercomputer with an SSH client on your own workstation, open a shell session on the `interactive` partition using the [`sinteractive` command](../computing/running/interactive-usage.md)**. As an example, the command below would launch a session with 4 GB of memory and 8 GB of local disk. Local disk space should always be reserved for temporary files when using R interactively. -- We use `-j $SLURM_CPUS_PER_TASK -k` to tell GNU parallel to keep running 4 applications in parallel, while ensuring that the job output order matches the input order. The number of simultaneous parallel applications is defined using `--cpus-per-task`. +=== "Puhti" + ``` bash + sinteractive --account --mem 4000 --tmp 8 + ``` + +=== "Mahti" + ``` bash + # note that on Mahti, the available memory is determined by the number of cores (1.875 GiB each) + sinteractive --account --cores 2 --tmp 8 + ``` + +=== "Roihu" + ``` bash + sinteractive --account --mem 4000 --tmp 8 + ``` -- For a real-life analysis, we would likely need much more time and memory (determined by what we do within our R script). +It is possible to specify other options including the running time ([see the `sinteractive` documentation](../computing/running/interactive-usage.md)). -```bash -#!/bin/bash -l -#SBATCH --job-name=r_array_gnupara -#SBATCH --account= -#SBATCH --output=output_%j_%a.txt -#SBATCH --error=errors_%j_%a.txt -#SBATCH --partition=small -#SBATCH --time=00:05:00 -#SBATCH --array=0-9 -#SBATCH --ntasks=1 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=1000 -#SBATCH --cpus-per-task=4 +Once you have opened an interactive shell session, you can **launch a command line version of R** as follows (note that the command needs to be run on a compute node): -# Load parallel and r-env -module load parallel/20200122 +``` bash module load r-env - -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi - -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron - -# Split runs into arrays and run the R script -(( from_run = SLURM_ARRAY_TASK_ID * 150 + 1 )) -(( to_run = SLURM_ARRAY_TASK_ID * 150 + 150 )) - -sed -n "${from_run},${to_run}p" mylist.txt | \ - parallel -j $SLURM_CPUS_PER_TASK -k \ - apptainer_wrapper exec Rscript --no-save myscript.R \ - $SLURM_ARRAY_TASK_ID -``` - -If we wanted to access the unique run identifier as well as the array number within our R script, we could use the `commandArgs` function. - -```r -# For example: -arrays <- commandArgs(trailingOnly = TRUE) -``` - -*Jobs using `doMPI` (with `foreach`)* - -The `foreach` package implements a for-loop that uses iterators and allows for parallel execution using the `%dopar%` operator. It is possible to execute parallel `foreach` loops on Puhti using the `doMPI` package. While otherwise the batch job file looks similar to that used for a multi-processor job, we replace `--cpus-per-task=8` with `--ntasks=8`. In addition, we could modify the `srun` command at the end of the batch job file: - -```bash -srun apptainer_wrapper exec Rscript --no-save --slave myscript.R -``` - -The `--slave` argument is optional and will prevent different processes from printing out a welcome message etc. - -Unlike when using `snow`, jobs using `doMPI` launch a number of R sessions equal to the number of reserved cores that all begin to execute the given R script. It is important to include the `startMPIcluster()` call near the beginning of the R script as anything before it will be executed by all available processes (while only the master process continues after it). Upon completion, the cluster is closed using `closeCluster()`. The `mpi.quit()` function can then be used to terminate the MPI execution environment and to quit R: - -```r -library(doMPI, quietly = TRUE) -cl <- startMPIcluster() -registerDoMPI(cl) - -system.time(a <- foreach(i = 1:7) %dopar% system.time(sort(runif(1e7)))) -a - -closeCluster(cl) -mpi.quit() +start-r ``` -*Jobs using `snow`* +**Interactive use on a login node** -Whereas most parallel R jobs employing the `r-env` module can be submitted using `srun apptainer_wrapper exec Rscript`, those involving the package `snow` need to be executed using a separate command (`RMPISNOW`). `snow` relies on a communication model where a master process is used to control other processes (workers). Because of this, the batch job file must specify one more task than the planned number of `snow` workers, as the master needs its own task. For example, for a job requiring seven workers, we could submit a job as follows: +It is also possible to use the R console on the login node for light tasks. Use this option only for moving data, checking package availability and installing packages. Puhti login nodes are [not intended for heavy computing](../computing/usage-policy.md#login-nodes). -```bash -#!/bin/bash -l -#SBATCH --job-name=r_snow -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=test -#SBATCH --time=00:05:00 -#SBATCH --ntasks=8 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=1000 +To launch the R console on a login node, run the following commands: -# Load r-env +``` bash module load r-env - -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi - -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron - -# Run the R script -srun apptainer_wrapper exec RMPISNOW --no-save --slave -f myscript.R -``` - -Unlike when using `foreach` and `doMPI`, here only the master process runs the R script. The R script must contain the call `getMPIcluster()` that is used to produce a reference to the cluster which can then be passed onto other functions. Upon completion of the analysis, the cluster is stopped using `stopCluster()`. For example: - -```r -cl <- getMPIcluster() - -funtorun <- function(k) { - system.time(sort(runif(1e7))) -} - -system.time(a <- clusterApply(cl, 1:7, funtorun)) -a - -stopCluster(cl) +apptainer_wrapper exec R --no-save ``` -*Jobs using `future`* - -The `future` package provides an API for R jobs using futures (see the [future CRAN website](https://cran.r-project.org/web/packages/future/index.html) for details). Whether futures are resolved sequentially or in parallel is specified using the function `plan()`. - -For analyses requiring a single node, `plan(multisession)` and `plan(multicore)` are suitable. The former spawns multiple independent R processes and the latter forks an existing R process. Using `plan(cluster)` is suitable for work using multiple nodes. - -To submit a job involving multisession or multicore futures, one should specify a single node (`--nodes=1`), a single task (`--ntasks=1`), and the number of cores (`--cpus-per-task=x`; 40 is the maximum on a single node). By default, the number of workers is the number of cores given by `availableCores()`. For guidelines on designing batch job files, see other examples on this page. - -The R script below could be used to compare analysis times using sequential, multisession and multicore strategies. - -```r -library(future) -library(tictoc) -library(furrr) - -# Different future plans (choose one) -# (Note: three cores and thus three workers were used in this example) +### Non-interactive batch jobs -# plan(sequential) -# plan(multisession) -# plan(multicore) +Further to interactive jobs, R scripts can be run non-interactively using batch job files (how to say this is the default supercomputer way without telling people to use too many resources?). Batch jobs are recommended in particular for long and resource-heavy tasks. In addition to the following examples, [see the Puhti batch job documentation](../computing/running/creating-job-scripts-puhti.md) for more information. If you are new to batch jobs, check the materials of the [CSC Computing Environment on batch jobs](https://csc-training.github.io/csc-env-eff/part-1/batch-jobs/). Batch job files are submitted to the batch job system on a login node as follows: -# Analysis timing - -tic() -nothingness <- future_map(c(2, 2, 2), ~Sys.sleep(.x)) -toc() - -# sequential: 6.157 sec -# multisession: 2.463 sec -# multicore: 2.212 sec +``` bash +sbatch batch_job_file.sh ``` -For multi-node analyses using `plan(cluster)`, the job can be submitted using the package `snow`. As we are using `snow`, R must be launched using `RMPISNOW` and we should specify enough tasks for both the master and worker processes (see 'Jobs using `snow`'). To use `future` with `snow`, the following lines would also need to be included in the R script: +#### Basic R batch job script + +Below is an example for submitting a serial single-processor R batch job. Note that the `test` partition is used, which has a time limit of 15 minutes and is used for testing purposes only. Actual R batch jobs should in most cases be run in the `small` partition. + +!!! warning "" + In batch jobs, make sure to define a project-specific temporary directory in `/scratch/` or on [the fast local disk](../computing/running/creating-job-scripts-puhti.md#local-storage). + +We execute the R script using the `apptainer_wrapper` command, which makes sure project directories are visible in the Apptainer container that `r-env` runs in. + +=== "Puhti" + ``` bash + #!/bin/bash -l + #SBATCH --job-name=r_serial # Job name + #SBATCH --account= # Billing project, has to be defined! + #SBATCH --output=output_%j.txt # File for storing output (%j replaced by job id) + #SBATCH --error=errors_%j.txt # File for storing errors (%j replaced by job id) + #SBATCH --partition=test # Job queue (partition), in general use 'small' + #SBATCH --time=00:05:00 # Max. duration of the job + #SBATCH --cpus-per-task=1 # Number of cores + #SBATCH --ntasks=1 # Number of tasks (only change this for multinode/MPI jobs) + #SBATCH --nodes=1 # Number of nodes (only change this for multinode/MPI jobs) + #SBATCH --mem-per-cpu=1000 # Memory to reserve per core + + # Load the r-env module + module load r-env + + # Clean up .Renviron file in home directory + if test -f ~/.Renviron; then + sed -i '/TMPDIR/d' ~/.Renviron + fi + + # Specify a temporary directory path + echo "TMPDIR=/scratch/" >> ~/.Renviron + + # Run the R script + srun apptainer_wrapper exec Rscript --no-save myscript.R + ``` + +=== "Mahti" + ``` bash + #!/bin/bash -l + #SBATCH --job-name=r_serial # Job name + #SBATCH --account= # Billing project, has to be defined! + #SBATCH --output=output_%j.txt # File for storing output (%j replaced by job id) + #SBATCH --error=errors_%j.txt # File for storing errors (%j replaced by job id) + #SBATCH --partition=test # Job queue (partition), in general use 'small' + #SBATCH --time=00:05:00 # Max. duration of the job + #SBATCH --cpus-per-task=1 # Number of cores (1.8 GB of memory each) + #SBATCH --ntasks=1 # Number of tasks (only change this for multinode/MPI jobs) + #SBATCH --nodes=1 # Number of nodes (only change this for multinode/MPI jobs) + + # Load the r-env module + module load r-env + + # Clean up .Renviron file in home directory + if test -f ~/.Renviron; then + sed -i '/TMPDIR/d' ~/.Renviron + fi + + # Specify a temporary directory path + echo "TMPDIR=/scratch/" >> ~/.Renviron + + # Run the R script + srun apptainer_wrapper exec Rscript --no-save myscript.R + ``` + +=== "Roihu" + ``` bash + #!/bin/bash -l + #SBATCH --job-name=r_serial # Job name + #SBATCH --account= # Billing project, has to be defined! + #SBATCH --output=output_%j.txt # File for storing output (%j replaced by job id) + #SBATCH --error=errors_%j.txt # File for storing errors (%j replaced by job id) + #SBATCH --partition=test # Job queue (partition), in general use 'small' + #SBATCH --time=00:05:00 # Max. duration of the job + #SBATCH --cpus-per-task=1 # Number of cores + #SBATCH --ntasks=1 # Number of tasks (only change this for multinode/MPI jobs) + #SBATCH --nodes=1 # Number of nodes (only change this for multinode/MPI jobs) + + # Load the r-env module + module load r-env + + # Clean up .Renviron file in home directory + if test -f ~/.Renviron; then + sed -i '/TMPDIR/d' ~/.Renviron + fi + + # Specify a temporary directory path + echo "TMPDIR=/scratch/" >> ~/.Renviron + + # Run the R script + srun apptainer_wrapper exec Rscript --no-save myscript.R + ``` + +In the above example, one task (`--ntasks=1`) is executed with 1 CPU core (`--cpus-per-task=1`), 1 GB of memory (`--mem-per-cpu=1000`) and a run time of five minutes (`--time=00:05:00`) reserved for the job. + +The command `module load r-env` loads the latest `r-env` version available. To specify which module version is loaded, use `module load r-env/`, for example `module load r-env/440`. + +!!! info "More than one CPU core?" + By default, R uses one CPU core. When you are working with an R script or packages that can take advantage of multiple processors and parallel processing, take a look at the examples for [parallel R batch jobs](../support/tutorials/parallel-r.md). + +### R package installations -```r -library(future) +It is possible to check if a particular package is already installed as follows. -cl <- getMPIcluster() -plan(cluster, workers = cl) +``` r +# One way is to try loading the package: +library(packagename) -# Analysis here +# If you don't want to load the package, it is also +# possible to search through a list: +installed_packages <- library()$results[,1] +"packagename" %in% installed_packages -stopCluster(cl) +# Note: both ways are sensitive to upper- and lower-case letters ``` -For practical examples of jobs using `plan(cluster)` and `plan(multicore)` with raster data, [see this page](https://github.com/csc-training/geocomputing/tree/master/R/puhti/02_parallel_future). - -*Jobs using `pbdMPI`* - -In analyses using the `pbdMPI` package, each process runs the same copy of the program as every other process while operating on its own data. In other words, there is no separate master process as in `snow` or `doMPI`. Executing batch jobs using `pbdMPI` can be done using the `srun apptainer_wrapper exec Rscript` command. For example, we could submit a job with four tasks divided between two nodes (with two tasks allocated to each node): - -```bash -#!/bin/bash -l -#SBATCH --job-name=r_pbdmpi -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=test -#SBATCH --time=00:05:00 -#SBATCH --ntasks-per-node=2 -#SBATCH --nodes=2 -#SBATCH --mem-per-cpu=1000 +Additional R package installations can be arranged via two routes: -# Load r-env -module load r-env +- Project-specific installations can be used by creating a separate package directory in the `/projappl/` directory (instructions below; also see [here](../computing/disk.md#projappl-directory) for information on ProjAppl) -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi +- Requests for general installations (provided to all users as part of the module): please contact [CSC Service Desk](../support/contact.md) -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron +To make use of a project-specific package library, follow these instructions. First create a new folder inside your project directory. Note that the folder should be specific to the R version you are using (R packages installed using different `r-env` modules are not cross-compatible). -# Run the R script -srun apptainer_wrapper exec Rscript --no-save --slave myscript.R +``` r +# On the command prompt: +# First navigate to /projappl/, then +mkdir project_rpackages_ ``` -As an example, this batch job file could be used to execute the following "hello world" script (original version available via the `pbdMPI` [GitHub repository](https://github.com/snoweye/pbdMPI)). The `init()` function initializes the MPI communicators while `finalize()` is used to shut them down and to exit R. +You can then add the folder to your library trees in R: -```r -library(pbdMPI, quietly = TRUE) +``` r +# Add this to your R code: +.libPaths(c("/projappl//project_rpackages_", .libPaths())) +libpath <- .libPaths()[1] -init() +# This command can be used to check that the folder is now visible: +.libPaths() # It should be first on the list -message <- paste("Hello from rank", comm.rank(), "of", comm.size()) -comm.print(message, all.rank = TRUE, quiet = TRUE) +# Package installations should now be directed to the project +# folder by default. You can also specify the path, e.g. install.packages("package", lib = libpath) -finalize() +# Note that it's also possible to fetch the R version automatically using getRversion(). For example: +.libPaths(paste0("/projappl//project_rpackages_", gsub("\\.", "", getRversion()))) ``` -#### Improving performance using threading - -`r-env` has been compiled using the Intel® Math Kernel Library (MKL), enabling the execution of data analysis tasks using multiple threads. For more information on threading, [see the Intel® website](https://software.intel.com/content/www/us/en/develop/documentation/mkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading.html). - -By default, `r-env` is single-threaded. While users may set a desired number of threads for a job, the benefits of this in terms of computation times depend on the analysis. Because of this, we encourage experimenting with different thread numbers and benchmarking your code using a small example data set and, for example, the R package [`microbenchmark`](https://cran.r-project.org/web/packages/microbenchmark/index.html). - -!!! note - Note that simply adding more resources does not necessarily guarantee faster computation! - -The module uses OpenMP threading technology and the number of threads can be controlled using the environment variable `OMP_NUM_THREADS`. In practice, the number of threads is set to match the number of cores used for the job. Because `r-env` is based on an Apptainer container, when specifying the number of OpenMP threads we need to use the environment variable `APPTAINERENV_OMP_NUM_THREADS`. - -An example batch job script can be found below. Here we submit a job using eight cores (and therefore eight threads) on a single node. Notice how we match the number of threads and cores using `APPTAINERENV_OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK`. By using `APPTAINERENV_OMP_PLACES=cores`, we bind each thread to a single core. We also use `APPTAINERENV_OMP_PROC_BIND=close` to ensure that threads are placed as closely as possible (to allow faster communication between threads). Note that [other options](https://theartofhpc.com/pcse/omp-affinity.html) for controlling thread affinity are also available, depending on your analysis. - -```bash -#!/bin/bash -l -#SBATCH --job-name=r_multithread -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=small -#SBATCH --time=00:05:00 -#SBATCH --ntasks=1 -#SBATCH --cpus-per-task=8 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=2000 - -# Load r-env -module load r-env - -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi - -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron - -# Match thread and core numbers -export APPTAINERENV_OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK - -# Thread affinity control -export APPTAINERENV_OMP_PLACES=cores -export APPTAINERENV_OMP_PROC_BIND=close +To use R packages installed in `/projappl`, add the following to the beginning of your R script. This modifies your library trees within a given R session only. In other words, you will need to run this each time when launching R: -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R +``` r +.libPaths(c("/projappl//project_rpackages_", .libPaths())) ``` -In a multi-core interactive job, the number of threads can be automatically matched with the number of cores by running a multi-threaded version of the `start-r` or `start-rstudio-server` commands: +Alternatively, you can add the desired changes to an `.Renviron` file (only when not using RStudio): -```bash -start-r-multithread # or -start-rstudio-server-multithread +``` bash +echo "R_LIBS=/projappl//project_rpackages_" >> ~/.Renviron ``` -#### OpenMP / MPI hybrid jobs - -Further to [executing multi-threaded R jobs on a single node](#improving-performance-using-threading), these can also be run on multiple nodes. In such cases, one must specify the number of: - -- Nodes (`--nodes`) - -- MPI processes per node (`--ntasks-per-node`) - -- OpenMP threads used for each MPI process (`--cpus-per-task`) - -When listing these in a batch job file, note that `--ntasks-per-node × --cpus-per-task` must be less than or equal to 40 (the maximum number of cores available on a single node on Puhti). For large multinode jobs, aim to use full nodes, i.e. use all 40 cores in each node. Further to selecting a suitable number of OpenMP threads, identifying the optimal number and division of MPI processes will require experimentation due to these being job-specific. - -As an example of an OpenMP / MPI hybrid job, the submission below would use a total of four MPI processes (two tasks per node with two nodes reserved), with each process employing eight OpenMP threads. Overall, the job would use 32 cores (`--cpus-per-task × --ntasks-per-node × --nodes`). As with multi-threaded jobs running on a single node, the number of threads and cores is matched using `APPTAINERENV_OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK`. We also use the same variables for thread affinity control. - -```bash -#!/bin/bash -l -#SBATCH --job-name=r_multithread_multinode -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=test -#SBATCH --time=00:05:00 -#SBATCH --nodes=2 -#SBATCH --ntasks-per-node=2 -#SBATCH --cpus-per-task=8 -#SBATCH --mem-per-cpu=2000 - -# Load r-env -module load r-env - -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi - -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron - -# Match thread and core numbers -export APPTAINERENV_OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK - -# Thread affinity control -export APPTAINERENV_OMP_PLACES=cores -export APPTAINERENV_OMP_PROC_BIND=close - -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R -``` +!!! note "" + When using `r-env`, user-defined changes to R library paths must be specified inside an R session or in relation to an `.Renviron` file. Other changes (e.g. using `export` to modify environment variables) will not work due to the R installation running inside an Apptainer container. If your analysis would require changes that cannot be achieved through the above means, please contact us for a module-wide package installation. -#### Using fast local storage +### Using fast local storage -For I/O-intensive analyses, [fast local storage](../computing/running/creating-job-scripts-puhti.md#local-storage) can be used in non-interactive batch jobs with minor changes to the batch job file. Interactive R jobs use fast local storage by default. +For jobs that read and write large numbers of files (I/O-intensive analyses), [fast local storage](../computing/running/creating-job-scripts-puhti.md#local-storage) can be used in non-interactive batch jobs with minor changes to the batch job file. Interactive R jobs use fast local storage by default. An example of a serial batch job using 10 GB of fast local storage (`--gres=nvme:10`) is given below. Here a temporary directory is specified using the environment variable `TMPDIR`, in contrast to the prior examples where it was set as `/scratch/`. -```bash -#!/bin/bash -l -#SBATCH --job-name=r_serial_fastlocal -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=test -#SBATCH --time=00:05:00 -#SBATCH --ntasks=1 -#SBATCH --nodes=1 -#SBATCH --mem-per-cpu=1000 -#SBATCH --gres=nvme:10 - -# Load the module -module load r-env - -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi - -# Specify NVMe temp folder path -echo "TMPDIR=$TMPDIR" >> ~/.Renviron - -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R -``` +=== "Puhti" + ``` bash + #!/bin/bash -l + #SBATCH --job-name=r_serial_fastlocal + #SBATCH --account= + #SBATCH --output=output_%j.txt + #SBATCH --error=errors_%j.txt + #SBATCH --partition=test + #SBATCH --time=00:05:00 + #SBATCH --ntasks=1 + #SBATCH --nodes=1 + #SBATCH --mem-per-cpu=1000 + #SBATCH --gres=nvme:10 + + # Load the module + module load r-env + + # Clean up .Renviron file in home directory + if test -f ~/.Renviron; then + sed -i '/TMPDIR/d' ~/.Renviron + fi + + # Specify NVMe temp folder path + echo "TMPDIR=$TMPDIR" >> ~/.Renviron + + # Run the R script + srun apptainer_wrapper exec Rscript --no-save myscript.R + ``` + +=== "Mahti" + ``` bash + #!/bin/bash -l + #SBATCH --job-name=r_serial_fastlocal + #SBATCH --account= + #SBATCH --output=output_%j.txt + #SBATCH --error=errors_%j.txt + #SBATCH --partition=test + #SBATCH --time=00:05:00 + #SBATCH --ntasks=1 + #SBATCH --nodes=1 + #SBATCH --cpus-per-task=1 + #SBATCH --gres=nvme:10 + + # Load the module + module load r-env + + # Clean up .Renviron file in home directory + if test -f ~/.Renviron; then + sed -i '/TMPDIR/d' ~/.Renviron + fi + + # Specify NVMe temp folder path + echo "TMPDIR=$TMPDIR" >> ~/.Renviron + + # Run the R script + srun apptainer_wrapper exec Rscript --no-save myscript.R + ``` + +=== "Roihu" + ``` bash + #!/bin/bash -l + #SBATCH --job-name=r_serial_fastlocal + #SBATCH --account= + #SBATCH --output=output_%j.txt + #SBATCH --error=errors_%j.txt + #SBATCH --partition=test + #SBATCH --time=00:05:00 + #SBATCH --ntasks=1 + #SBATCH --nodes=1 + #SBATCH --cpus-per-task=1 + #SBATCH --gres=nvme:10 + + # Load the module + module load r-env + + # Clean up .Renviron file in home directory + if test -f ~/.Renviron; then + sed -i '/TMPDIR/d' ~/.Renviron + fi + + # Specify NVMe temp folder path + echo "TMPDIR=$TMPDIR" >> ~/.Renviron + + # Run the R script + srun apptainer_wrapper exec Rscript --no-save myscript.R + ``` Further to temporary file storage, data sets for analysis can be stored on a fast local drive in the location specified by the variable `LOCAL_SCRATCH`. To enable R to find your data, you will need to indicate this location in your R script. After launching R, you can print out the location using the following command: -``` +``` Sys.getenv("LOCAL_SCRATCH") ``` -#### R interface to TensorFlow - -The `r-env` module supports GPU-accelerated TensorFlow jobs using the [R interface to TensorFlow](https://tensorflow.rstudio.com/). If you only require TensorFlow without access to R, please use one of the available [TensorFlow modules on Puhti](tensorflow.md). For general information on submitting GPU jobs, [see this tutorial](../support/tutorials/gpu-ml.md). Note that `r-env` includes CUDA and cuDNN libraries, so there is no need to load CUDA and cuDNN modules separately. - -To submit a GPU job using the R interface to TensorFlow, you need to use the GPU partition and specify the type and number of GPUs using the `--gres` flag. The rest is handled by the R script (see [this page for examples](https://tensorflow.rstudio.com/examples/). In the script below, we would reserve a single GPU and 10 CPUs in a single node: - -```bash -#!/bin/bash -l -#SBATCH --job-name=r_tensorflow -#SBATCH --account= -#SBATCH --output=output_%j.txt -#SBATCH --error=errors_%j.txt -#SBATCH --partition=gpu -#SBATCH --time=01:00:00 -#SBATCH --ntasks=1 -#SBATCH --cpus-per-task=10 -#SBATCH --nodes=1 -#SBATCH --gres=gpu:v100:1 - -# Load the module -module load r-env - -# Clean up .Renviron file in home directory -if test -f ~/.Renviron; then - sed -i '/TMPDIR/d' ~/.Renviron -fi - -# Specify a temp folder path -echo "TMPDIR=/scratch/" >> ~/.Renviron - -# Run the R script -srun apptainer_wrapper exec Rscript --no-save myscript.R -``` - -Please note that interactive work using GPU acceleration (e.g. with RStudio) is not supported. - -#### GPU acceleration using NVBLAS - -It is possible to configure `r-env` to use NVIDIA NVBLAS, a drop-in BLAS replacement with GPU support for several BLAS3 routines (for details, see the [NVBLAS website](https://docs.nvidia.com/cuda/nvblas/index.html)). Routines not supported by NVBLAS are directed to a fallback BLAS library, i.e. oneMKL in the case of the `r-env` module. - -Compared to CPU jobs, using NVBLAS may offer speed improvements without changes to the underlying R code. However, the benefits afforded are strongly analysis-specific. Additionally, NVBLAS jobs make sub-optimal use of reservations on the GPU partition, with only certain operations being routed to the GPU. - -Prior to running a NVBLAS job, consider the [Puhti GPU node usage policy](../computing/usage-policy.md#gpu-nodes) and this checklist: - -- Are BLAS3 routines the main bottleneck in your workflow? -- Are speed-ups possible through other means (e.g. rewriting your code)? -- Can certain parts of your script be run on a CPU partition rather than the GPU partition? - -NVBLAS can be used by following these steps: - -Step 1. Create a file called `nvblas.conf` in `~/nvblas` with the following contents: - -``` -NVBLAS_LOGFILE nvblas.log -NVBLAS_GPU_LIST ALL -NVBLAS_TRACE_LOG_ENABLED -NVBLAS_CPU_BLAS_LIB /opt/intel/oneapi/mkl/2022.1.0/lib/intel64/libmkl_rt.so -``` -Note that the CPU BLAS library listed above is specific to `r-env/421`. -Adding `NVBLAS_TRACE_LOG_ENABLED` is optional and prompts NVBLAS to create a list of all intercepted BLAS calls for debugging. - -Step 2. Add the following lines to your GPU batch job file: - -``` -# Use NVBLAS -export APPTAINERENV_LD_PRELOAD=/usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so -export APPTAINERENV_NVBLAS_CONFIG_FILE=~/nvblas/nvblas.conf -``` - #### Using `r-env` with Stan The `r-env` module includes several packages that make use of [Stan](https://mc-stan.org/) for statistical modelling. @@ -714,123 +485,74 @@ fit_serial <- brm( ) ``` -Note that [within-chain parallelisation with `brms`](https://cran.r-project.org/web/packages/brms/vignettes/brms_threading.html) requires a project-specific installation of CmdStan. Please contact [CSC Service Desk](../support/contact.md) for instructions. +### Profiling tools in R -#### R package installations +The most common profiling tools in R are Rprof and profvis. -It is possible to check if a particular package is already installed as follows. +old links, find newer ones?: -```r -# One way is to try loading the package: -library(packagename) +When trying to speed up an R job, use these tools to see which parts of your script are the slowest. Look for possibilities to make the slowest parts faster. Also functions from different packages might use different amounts of time for a similar computational task.In addition: +- Watch out for 'for loops' which grow an object step by step and try to find alternative ways. +- Make the script run in parallel. See separate page. -# If you don't want to load the package, it is also -# possible to search through a list: -installed_packages <- library()$results[,1] -"packagename" %in% installed_packages +### Pdf rendering -# Note: both ways are sensitive to upper- and lower-case letters -``` - -Additional R package installations can be arranged via two routes: - -- Project-specific installations can be used by creating a separate package directory in the `/projappl/` directory (instructions below; also see [here](../computing/disk.md#projappl-directory) for information on ProjAppl) - -- Requests for general installations (provided to all users as part of the module): please contact [CSC Service Desk](../support/contact.md) - -To make use of a project-specific package library, follow these instructions. First create a new folder inside your project directory. Note that the folder should be specific to the R version you are using (R packages installed using different `r-env` modules are not cross-compatible). +If pdf rendering of an R Markdown or a Quarto document fails, run the following in R: -```r -# On the command prompt: -# First navigate to /projappl/, then -mkdir project_rpackages_ +``` r +tinytex::install_tinytex() ``` -You can then add the folder to your library trees in R: - -```r -# Add this to your R code: -.libPaths(c("/projappl//project_rpackages_", .libPaths())) -libpath <- .libPaths()[1] - -# This command can be used to check that the folder is now visible: -.libPaths() # It should be first on the list - -# Package installations should now be directed to the project -# folder by default. You can also specify the path, e.g. install.packages("package", lib = libpath) - -# Note that it's also possible to fetch the R version automatically using getRversion(). For example: -.libPaths(paste0("/projappl//project_rpackages_", gsub("\\.", "", getRversion()))) - -``` +When prompted about an existing LaTeX distribution, answer `yes` to continue the installation anyway. -To use R packages installed in `/projappl`, add the following to the beginning of your R script. This modifies your library trees within a given R session only. In other words, you will need to run this each time when launching R: +### Working with Allas -```r -.libPaths(c("/projappl//project_rpackages_", .libPaths())) -``` +The `r-env` module comes with the [`aws.s3`](https://cran.r-project.org/web/packages/aws.s3/) package for working with S3 storage, which makes it possible to use the Allas storage system directly from an R script. See [here](https://github.com/csc-training/geocomputing/blob/master/R/allas/working_with_allas_from_R_S3.R) for a practical example involving raster data. -Alternatively, you can add the desired changes to an `.Renviron` file: +Accessing Allas via the `r-env` module can be done as follows. First configure Allas by running these commands before launching an interactive shell session: -```bash -echo "R_LIBS=/projappl//project_rpackages_" >> ~/.Renviron +``` bash +module load allas +allas-conf --mode s3cmd ``` -!!! note - When using `r-env`, user-defined changes to R library paths must be specified inside an R session or in relation to an `.Renviron` file. Other changes (e.g. using `export` to modify environment variables) will not work due to the R installation running inside an Apptainer container. If your analysis would require changes that cannot be achieved through the above means, please contact us for a module-wide package installation. - -#### Pdf rendering +After [starting an interactive session and launching R / RStudio Server](#interactive-use-on-a-compute-node), you can now access your bucket list as follows. Note that, for this to work, you will need to have the `allas` module loaded and the argument `region=''` added to the `bucketlist()` function: -If pdf rendering of an R Markdown or a Quarto document fails, run the following in R: - -```r -tinytex::install_tinytex() +``` r +library(aws.s3) +bucketlist(region='') ``` -When prompted about an existing LaTeX distribution, answer `yes` to continue the installation anyway. - - -## Working with Allas +## Serial batch jobs -The `r-env` module comes with the [`aws.s3`](https://cran.r-project.org/web/packages/aws.s3/) package for working with S3 storage, which makes it possible to use the Allas storage system directly from an R script. See [here](https://github.com/csc-training/geocomputing/blob/master/R/allas/working_with_allas_from_R_S3.R) for a practical example involving raster data. +## Parallel batch jobs -Accessing Allas via the `r-env` module can be done as follows. First configure [Allas connection for S3](../data/Allas/using_allas/allas-conf.md#s3-connection): +## Improving performance using threading -```bash -module load allas -allas-conf --mode S3 -``` +## OpenMP / MPI hybrid jobs -To get the list of your buckets: +## Non-interactive use -```r -library(aws.s3) -options("cloudyr.aws.default_region" = "") -bucketlist() -``` -## Citation +## Citation {#citation} For finding out the correct citations for R and different R packages, you can type: -```r +``` r citation() # for citing R citation("package") # for citing R packages ``` ## Further information -- [r-env container recipes](https://github.com/CSCfi/singularity-recipes/tree/main/r-env-singularity) (link to public GitHub repository) - -- [Tutorial on parallel R](../support/tutorials/parallel-r.md) - -- [R FAQs](https://cran.r-project.org/faqs.html) (hosted by CRAN) +- [Parallel R guide](../support/tutorials/parallel-r.md) -- [Related Projects](https://www.r-project.org/other-projects.html) (list of R-related projects on R Project website) +- [r-env container recipes](https://github.com/CSCfi/singularity-recipes/tree/main/r-env-singularity) (link to public GitHub repository) -- [R package cheatsheets](https://rstudio.com/resources/cheatsheets/) (hosted on RStudio website) +- [R FAQs](https://cran.r-project.org/faqs.html) (hosted by CRAN) -- [tidyverse](https://www.tidyverse.org/) (pre-installed on the `r-env` module) +- [Related Projects](https://www.r-project.org/other-projects.html) (list of R-related projects on R Project website) -- [doMPI](https://cran.r-project.org/web/packages/doMPI/index.html), [future](https://cran.r-project.org/web/packages/future/index.html), [furrr](https://cran.r-project.org/web/packages/furrr/index.html), [lidR](https://cran.r-project.org/web/packages/lidR/index.html), [pbdMPI](https://cran.r-project.org/web/packages/pbdMPI/index.html), [snow](https://cran.r-project.org/web/packages/snow/index.html) (CRAN pages for parallel R packages) +- [R package cheatsheets](https://rstudio.com/resources/cheatsheets/) (hosted on RStudio website) +- [tidyverse](https://www.tidyverse.org/) (pre-installed on the `r-env` module)