Skip to content

Long term storage

Johan Nylander edited this page Sep 27, 2024 · 16 revisions

Linux working

  • Last modified: fre sep 27, 2024 11:02
  • Sign: JN
  • Tested on: Xubuntu 22.04
  • Solved: Yes

NRM provides a server, nrmdna01.nrm.se where users can store data for "a longer term". Primarily, the server is intended for raw sequencing data (such as fastq-files). Exactly what "longer-term storage" means is probably not decided, but if you need to plan ahead, it would be good to ask the head of research (FA) at NRM for answers.

To gain access to these server, you need to send an email to NRM-IT (mailto:[email protected]) saying "I need access to the backup server".

Standard SSH access can be made using SSH-keys, and you need to provide your public ed25519-key to NRM-IT (see also SSH). Direct access to the servers are, however, not currently allowed unless you either use a NRM MS Windows computer with special permissions, or if you access directly from rackham.uppmax.uu.se, or dardel.pdc.kth.se. Rumour has it that you may also get drag-and-drop access from NRM MS Windows if behind the NRM firewall -- if you ask for it. Again, ask NRM-IT for details.

Create your user folder on nrmdna01

Once logged in to nrmdna01.nrm.se, you can now create your own personal folder in /projects/NRMDEPARTMENT-projects/NRMUSER. For example, for NRMUSER johanyla working at the BIO department:

$ mkdir /projects/BIO-projects/johanyla

Transfer files from compute clusters (Rackham, Dardel) to nrmdna01

When transferring data between computers, you can, from a command-line perspective, either "push" or "pull". That is you can copy to ("push") or copy from ("pull"). If you want to copy a file from a cluster to nrmddna01.nrm.se, you either start the process on the cluster (using a command to the SLURM-queue system for long-running tasks), or you start the process on the nrmdna01.nrm.se server.

In general, I would recommend starting the copying process from the nrmdna01 server, an "pull" the data from any of the servers.

Given that you have access with an SSH-key to the server from nrmdna01.nrm.se, here is an example utilizing screen and rsync

nrmuser@nrmdna01:~$ screen -S name_for_the_session
nrmuser@nrmdna01:~$ rsync -avhP [email protected]:/path/to/folder/on/server /path/to/folder/on/nrmdna01

Then detach from the screen session (Ctrl+A, Ctrl+D). The later you can attach to the session:

nrmuser@nrmdna01:~$ screen -R name_for_the_session

And if all is good, exit the screen session

nrmuser@nrmdna01:~$ exit

Note: For thorough checking of transfer success, you may want to do some checks of file integrity. This can be done directly in rsync by adding the option -c or --checksum (which may add considerable extra time), or utilizing MD5SUMS prior and after transfer (see for example https://github.com/nylander/Check_MD5SUMS).

Note 2: rsync have a lot of options. One important detail to keep in mind is the trailing forward-slash (/) on the source: If you add it after a source folder, only the folder content is transferred. If you leave it out, the folder with its content is transferred!

Note 3: If your transferred files are considered as a backup, it's a good idea to make one extra step to prevent them from being accidentally removed (either by you but also by someone else). Assume you transferred folder "data123" to your personal folder on nrmdna01, then apply this command:

$ chmod -R -w data123

This will remove write (and delete) permissions on the folder and all files in it. To restore permissions, use chmod -R +w data123.

Using a SLURM-script to "push" from server to nrmdna01

If, on the other hand, you want to "push" data from the server to nrmdna01.nrm.se, you basically reverse the rsync-command above, and wrap it in a slurm script.

Here is an example for transferring the folder "folder" from dardel.pdc.kth.se to the user project folder on nrmdna01.nrm.se (e.g. /projects/BIO-projects/NRMUSER). Note that the folder NRMUSER (e.g. johanyla) needs to be present.

#!/bin/bash -l
# File: rsync-to-nrm.slurm.sh
# Slurm script example for rsync from dardel to nrmdna01.
# Test by using
#     sbatch --test-only rsync-to-nrm.slurm.sh
# Start by using
#     sbatch rsync-to-nrm.slurm.sh
# Stop by using
#     scancel 1234
#     scancel -i -u $USER
#     scancel --state=pending -u $USER
# Monitor by using
#    squeue -u $USER

#SBATCH -J rsync-to-nrm
#SBATCH -A snic1234-5-678
#SBATCH -t 01:00:00
#SBATCH -p shared
#SBATCH -c 1
#SBATCH --output=rsync-to-nrm.log

rsync -avhP /cfs/klemming/path/to/folder [email protected]:/projects/NRMDEPARTMENT-projects/NRMUSER