-
Notifications
You must be signed in to change notification settings - Fork 5
Farm Tips and Tricks
The maximum number of arguments in any command is 131072. If you are getting errors from bash saying you have too many arguments, try and specify the maximum number of arguments as 131072. For example, in removing a bunch of files:
$ find -name "[name_of_file].*" | xargs -n131072 rm
This command will remove all files that match the pattern
[name_of_file].*. * is a wildcard character. The -name is
different from [name_of_file] and should not be substituted!
We would suggest you to output your files to the locally to where the machine running
your job, you can output to folder /scratch/<username>/<jobid>. Below is why it helps.
Each node should have a /scratch that you can access and write to. Have
your code write locally to /scratch. Then when your script is done, have it
move the final file to your home directory (or wherever), so that you are
using the network only once. Make sure you do indeed move/delete the file,
so you don't fill up /scratch (which has <1Tb and is for everyone's use).
This can help quite a bit. But it's a 1TB disk made of spinning rust. So the
maximum is around 100 seeks/second. It's not shared (like your /home). But for
I/O intensive loads with many random access can still be faster on a laptop.
SSDs in general manage around 50,000 random IO access per second. A disk of spinning rust manages around 100. So laptops can be 500 times faster, however in most cases you can get better performance from a compute node by asking for more ram. Unfortunately that depends on how your application is written. If your application asks for the data to be synced to disk it can't be cached.
In any case the first step is to try /scratch, don't forget to have your job
clean up after itself. We suggest /scratch/<username>/<jobid>. You can use the -p option with mkdir to make yourself a directory on /scratch/ and your script won't break if that directory already exists.
You can check on the files you're making in scratch by ssh-ing into the node it's running on. Do not run jobs while ssh-ed into a node. The safest way to check on files is log into farm and use a command like this (change bigmem9 to the node your job is running on and myfile.txt to the file you're making): $ ssh bigmem9 'cd /scratch; ls; wc -l myfile.txt'
Some datasets will be used by more than one member of the lab. In order to prevent
unnecessary use of storage space by having duplicated copies of the same datasets,
we have a shared directory: /group/jrigrp/Share/ where we create directories
and put the files we want to share (e.g. /group/jrigrp/Share/MaizeHapMapV3.2.1/)
It can be useful to include some information about the files such as:
- description and/or links to description
- source (where were the files downloaded from e.g. iplant path, web url)
- publication associated with the data
RILab Wiki
Farm Subwiki
- Farm Information
- How to Use
- Account Set Up
- Data Transfer
- Interactive Use
- Software Installation
- Customization
- Productivity Tips
- Other Tips and Tricks
- Emailing Help
Other Computing Information
Maize
Protocols