-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a4e7b16
commit f88dca3
Showing
9 changed files
with
79 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Support for GPU jobscripts | ||
|
||
If your application requires an NVIDIA GPU, then you can request one by | ||
giving the option `--gpu` to the `justin create-stage` or | ||
`justin simple-workflow` commands as described in the | ||
[justin man page](justin_command.man_page.md). | ||
|
||
**Currently there are a limited number of sites offering GPUs to DUNE and you | ||
may need to wait significantly longer (hours?) than usual for jobs in the workflow | ||
to start running.** | ||
|
||
The CUDA libraries, drivers, /dev/nvidiaX devices, and tools like `nvidia-smi` are | ||
made available to your jobscript in the usual way. `$CUDA_VISIBLE_DEVICES` is | ||
set to the UUID of the GPU allocated to your job by the site, in the newer form | ||
`GPU-uuid` *not* as 0, 1, 2 etc. Please do not try to use any other GPUs you | ||
might be able to access: CUDA should respect `$CUDA_VISIBLE_DEVICES` as | ||
given and do what the site wants. | ||
|
||
Once the job starts, it reports to justIN information about the GPU it has | ||
discovered, including the GPU model name, the driver version, the compute | ||
capability, the VBIOS version, and the nonreserved memory in MiB. | ||
This information is shown on the job's own page in the dashboard. | ||
|
||
|
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
#!/bin/bash | ||
: <<'EOF' | ||
GPU Hello World jobscript for justIN | ||
Submit a workflow like this to run 10 jobs on workers with GPUs: | ||
justin simple-workflow --monte-carlo 10 --jobscript hello-world.jobscript --gpu | ||
Or like this to run jobs and put the output file into Rucio-managed storage: | ||
justin simple-workflow \ | ||
--monte-carlo 10 \ | ||
--jobscript hello-world.jobscript \ | ||
--gpu \ | ||
--description 'Hello GPU!!!' \ | ||
--scope usertests \ | ||
--output-pattern 'hello-world-*.txt:output-test-01' | ||
EOF | ||
|
||
# Check the GPU environment | ||
printenv | grep -i cuda | ||
nvidia-smi | ||
|
||
# Try to get an unprocessed file from this stage | ||
did_pfn_rse=`$JUSTIN_PATH/justin-get-file` | ||
|
||
if [ "$did_pfn_rse" != "" ] ; then | ||
did=`echo $did_pfn_rse | cut -f1 -d' '` | ||
pfn=`echo $did_pfn_rse | cut -f2 -d' '` | ||
rse=`echo $did_pfn_rse | cut -f3 -d' '` | ||
|
||
# Hello world to a txt file | ||
echo "Hello world $pfn" >hello-world-`date +%s.%N.txt` | ||
|
||
# Hello world to the jobscript log | ||
echo "Hello world $pfn" | ||
if [ $? = 0 ] ; then | ||
# If echo returns 0, then say we processed the file successfully | ||
echo "$pfn" > justin-processed-pfns.txt | ||
fi | ||
fi | ||
exit 0 |