Skip to content

Commit

Permalink
Resolve conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
alexhernandezgarcia committed Jan 23, 2024
2 parents 3f86923 + ce3d5df commit 28b54f0
Showing 1 changed file with 37 additions and 31 deletions.
68 changes: 37 additions & 31 deletions teaching/mlprojects24/slides/20240118-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,24 +57,6 @@ I am not an expert.

---

## Why does all this matter?

Because you will have access to computational resources from Calcul Québec for the development of your project, very kindly set up by .highligh1[Maxime Boissonneault], responsable des services à la recherche.

--

Please take a few minutes to create an account to use the cluster from Calcul Québec.

.center[[https://mokey.ift6759.calculquebec.cloud/auth/signup](https://mokey.ift6759.calculquebec.cloud/auth/signup)]

.highlight1[Important]: Please use your **first name**, **last name**, and **UdeM email** as it appears on StudiUM.

--

Besides this cluster, computational resources are also available via Colab: [colab.research.google.com](https://colab.research.google.com/)

---

## What is HPC?

.highlight1[High-performance computing (HPC)] typically refers to supercomputers, that is clusters of computers that provide access to computational resources.
Expand All @@ -94,12 +76,12 @@ A .highlight1[cluster] is a set connected computers that work together such that

## The Alliance and Calcul Québec

.highlight1[[The Digital Research Alliance of Canada](https://alliancecan.ca)], or _Alliance de recherche numérique du Canada_, or just the _Alliance_, and formerly known as _Compute Canada_, provides HPC to researchers in Canadian institutions and works closely with regional counterparts such as .highlight1[[Calcul Québec](https://www.calculquebec.ca/)].
.highlight1[[The Digital Research Alliance of Canada](https://alliancecan.ca)], _Alliance de recherche numérique du Canada_, or simply _the Alliance_, formerly known as _Compute Canada_, provides HPC to researchers in Canadian institutions and works closely with regional counterparts such as .highlight1[[Calcul Québec](https://www.calculquebec.ca/)].

Resources:

* Compute Canada Wiki: [docs.alliancecan.ca](https://docs.alliancecan.ca)
* Support email: `support@tech.alliancecan.ca`
* The Alliance Wiki: [docs.alliancecan.ca](https://docs.alliancecan.ca)
* Support email: `support@calculquebec.ca`

---

Expand All @@ -109,9 +91,9 @@ Resources:
* Remote connection to the nodes (via SSH, see later).
* Collectively shared.
* Gateway into the cluster.
* Important: not meant for heavy weight computing tasks.
* Typical use: managing source code, environments and data.
* Typical use: submitting jobs to the compute nodes.
* .highlight1[Important]: not meant for heavy weight computing tasks.

---

Expand Down Expand Up @@ -167,7 +149,7 @@ $ salloc --time=1:00:00 --mem=2G
```

.references[
Source: [Compute Canada Wiki](https://docs.computecanada.ca/wiki/Running_jobs#Interactive_jobs)
Source: [the Alliance Wiki](https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs)
]

---
Expand Down Expand Up @@ -206,7 +188,7 @@ echo "Bye!"
]

.references[
Source: [Compute Canada Wiki](https://docs.computecanada.ca/wiki/Running_jobs#Use_sbatch_to_submit_jobs)
Source: [the Alliance Wiki](https://docs.alliancecan.ca/wiki/Running_jobs#Use_sbatch_to_submit_jobs)
]

---
Expand Down Expand Up @@ -254,18 +236,18 @@ Other useful Slurm tools are the following:
### Python: best practices

* Load Python with `module load python/3.x[.y]`
* Always use `venv` or `virtualenv`
* Always use `virtualenv` or `venv`
* Consider building the virtual environment in the compute node:
* It improves I/O performance
* Use pre-downloaded packages with `module`
* Avoid Anaconda:
* It handles library management that should be left to Compute Canada admins
* It handles library management that should be left to the Alliance admins
* It installs in `$HOME` by default and makes tons of files.
* It is slower to install packages
* It modifies `$HOME/.bashrc`

.references[
[Why avoiding Anaconda in the cluster and how to transition to virtualenv](https://docs.alliancecan.ca/wiki/Anaconda/en)
Reference: [Why to avoid Anaconda (at least) in the cluster](https://docs.alliancecan.ca/wiki/Anaconda/en)
]

---
Expand All @@ -275,7 +257,7 @@ Other useful Slurm tools are the following:

* Do not run code in the login nodes.
* Try to not request more resources than you need and for more time you need.
* Use archives (`.tar`, `.hdf5`, etc.): do not unpack gazillion files because it makes things slow.
* Use archives (`.tar`, `.hdf5`, etc.): do not unpack a gazillion files because it makes things slow.
* Profile your code.
* Use `$SLURM_TMPDIR`:
* To build your virtual environment whenever possible.
Expand Down Expand Up @@ -325,7 +307,7 @@ Alex Hernández-García (he/il/él)
## Connect to the cluster
### SSH

The Secure Shell Protocol (SSH) is based on a client–server architecture, connecting an SSH client instance with an SSH server. We can use it to connect to our Calcul Québec cluster:
The Secure Shell Protocol ([SSH](https://docs.alliancecan.ca/wiki/SSH)) is based on a client–server architecture, connecting an SSH client instance with an SSH server. We can use it to connect to our Calcul Québec cluster:

``` bash
$ ssh <username>@ift6759.calculquebec.cloud
Expand All @@ -337,9 +319,33 @@ Lmod is automatically replacing "intel/2020.1.217" with "gcc/9.3.0".
[<username>@login1 ~]$
```

Configuring SSH keys for GitHub can take a while. These resources may help:
* [Generating a new SSH key and adding it to the ssh-agent](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)
--

It is convenient to [set up SSH keys](https://docs.alliancecan.ca/wiki/Using_SSH_keys_in_Linux) to log in password-less. [Here](https://mokey.ift6759.calculquebec.cloud/sshpubkey) you can add your SSH Public Keys for _our_ cluster: [mokey.ift6759.calculquebec.cloud/sshpubkey](https://mokey.ift6759.calculquebec.cloud/sshpubkey)

.references[
* [Configuring SSH keys for GitHub](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)
* [About `Error: Permission denied (publickey)`](https://docs.github.com/en/authentication/troubleshooting-ssh/error-permission-denied-publickey)
]

---

## Connect to the cluster
### SSH config

It is also convenient to set up our SSH configuration. SSH configuration in Linux systems is usually stored in `~/.ssh/config`. The following is an example for our cluster:

```
# ML projects class
Host mlprojects
User alexhg
Hostname ift6759.calculquebec.cloud
# Common defaults
Match all
ServerAliveInterval 60
ServerAliveCountMax 5
```

---

Expand Down

0 comments on commit 28b54f0

Please sign in to comment.