diff --git a/advanced-containers.md b/advanced-containers.md index e4ed9d663..58ace8986 100644 --- a/advanced-containers.md +++ b/advanced-containers.md @@ -13,7 +13,11 @@ exercises: 30 :::::::::::::::::::::::::::::::::::::::: questions -- How can I make more complex container images? +- How can I add local files (e.g. data files) into container + images at build time? + +- How can I access files stored on the host system from within a running Docker + container? :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -46,8 +50,39 @@ container image. ## Running containers -What command would we use to run Python from the `alpine-python` container? +Question: What command would we use to run Python from the `alpine-python` container? + + +::::::::::::::: solution + +## Solution + +We can run a container from the alpine-python container image using: + +```bash +$ docker container run alice/alpine-python +``` + +What happens? Since the `Dockerfile` that we built this container image from +had a `CMD` entry that specified `["python3", "--version"]`, running the above +command simply starts a container from the image, runs the `python3 --version` +command and exits. You should have seen the installed version of Python printed +to the terminal. + +Instead, if we want to run an interactive Python terminal, we can use `docker +container run` to override the default run command embedded within the +container image. So we could run: +```bash +$ docker container run -it alice/alpine-python python3 +``` + +The `-it` tells Docker to set up and interactive terminal connection to the +running container, and then we're telling Docker to run the `python3` command +inside the container which gives us an interactive Python interpreter prompt. +_(type `exit()` to exit!)_ + +::::::::::::::::::::::::: :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -65,9 +100,12 @@ python3: can't open file '//sum.py': [Errno 2] No such file or directory ## No such file or directory -What does the error message mean? Why might the Python inside the container +Question: What does the error message mean? Why might the Python inside the container not be able to find or open our script? +This question is here for you to think about - we explore the answer to this +question in the content below. + :::::::::::::::::::::::::::::::::::::::::::::::::: The problem here is that the container and its filesystem is separate from our diff --git a/config.yaml b/config.yaml index 022ae0c0e..7b7b14f69 100644 --- a/config.yaml +++ b/config.yaml @@ -27,7 +27,7 @@ life_cycle: 'beta' license: 'CC-BY 4.0' # Link to the source repository for this lesson -source: 'https://github.com/fishtree-attempt/docker-introduction/' +source: 'https://github.com/carpentries-incubator/docker-introduction/' # Default branch of your lesson branch: 'main' diff --git a/creating-container-images.md b/creating-container-images.md index a59e3f6f6..6d05350c2 100644 --- a/creating-container-images.md +++ b/creating-container-images.md @@ -272,7 +272,10 @@ There are a lot of choices when it comes to installing software -- sometimes too Here are some things to consider when creating your own container image: - **Start smart**, or, don't install everything from scratch! If you're using Python - as your main tool, start with a [Python container image](https://hub.docker.com/_/python). Same with [R](https://hub.docker.com/r/rocker/r-ver/). We've used Alpine Linux as an example + as your main tool, start with a [Python container + image](https://hub.docker.com/_/python). Same with the + [R programming language](https://hub.docker.com/r/rocker/r-ver/). We've used Alpine Linux as an + example in this lesson, but it's generally not a good container image to start with for initial development and experimentation because it is a less common distribution of Linux; using [Ubuntu](https://hub.docker.com/_/ubuntu), [Debian](https://hub.docker.com/_/debian) and [CentOS](https://hub.docker.com/_/centos) are all good options for scientific software installations. The program you're using might diff --git a/docker-image-examples.md b/docker-image-examples.md index 273133f7c..96183768d 100644 --- a/docker-image-examples.md +++ b/docker-image-examples.md @@ -1,7 +1,7 @@ --- title: Examples of Using Container Images in Practice -teaching: 20 -exercises: 0 +teaching: 10 +exercises: 15 --- ::::::::::::::::::::::::::::::::::::::: objectives @@ -21,13 +21,6 @@ let's apply what we learned to an example workflow. You may choose one or more of the following examples to practice using containers. -## Jekyll Website Example - -In this [Jekyll Website example](../instructors/e02-jekyll-lesson-example.md), you can practice -rendering this lesson website on your computer using the Jekyll static website generator in a Docker container. -Rendering the website in a container avoids a complicated software installation; instead of installing Jekyll and all the other tools needed to create the final website, all the work can be done in the container. -Additionally, when you no longer need to render the website, you can easily and cleanly remove the software from your computer. - ## GitHub Actions Example In this [GitHub Actions example](../instructors/e01-github-actions.md), you can learn more about diff --git a/e02-jekyll-lesson-example.md b/e02-jekyll-lesson-example.md deleted file mode 100644 index 1e1029215..000000000 --- a/e02-jekyll-lesson-example.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -title: Using Docker with Jekyll - Containers Used in Generating this Lesson -teaching: 20 -exercises: 0 -questions: -- What is an example of how I might use Docker instead of installing software? -- How can containers be useful to me for building websites? -objectives: -- Use an existing container image and Docker in place of complicated software installation - work. -- Demonstrate how to construct a website using containers to transform a specification - into a fully-presented website. -keypoints: -- You can use existing container images and Docker instead of installing additional - software. -- The generation of this lesson website can be effected using a container. ---- - -As previously mentioned earlier in the lesson, containers can be helpful for -using software that can be difficult to install. An example is the software -that generates this lesson website. The website for this lesson is generated mechanically, -based on a set of files that specify the configuration of the site, its presentation template, -and the content to go on this page. When working on updates to this lesson, -you might want to preview those changes using a local copy of the website. -This requires installing Jekyll and dependencies such as Ruby and Gemfiles to your local computer -which can be difficult to achieve given complexities such as needing to match specific versions of the software components. Instead you could use Docker and a pre-built Jekyll container image. - -First we need to get a copy of the website source to work with on your computer. -In your shell window, in your `docker-intro` create a new directory `build-website` and `cd` into it. We will be expanding a ZIP file into this directory later. - -Now open a web browser window and: - -1. Navigate to the [GitHub repository][docker-introduction repository] that contains the files for this session; -2. Click the green "Clone or download" button on the right-hand side of the page; -3. Click "Download ZIP". -4. The downloaded ZIP file should contain one directory named `docker-introduction-gh-pages`. -5. Move the `docker-introduction-gh-pages` folder into the `build-website` folder you created above. - -::::::::::::::::::::::::::::::::::::::::: callout - -## There are many ways to work with ZIP files - -Note that the last two steps can be achieved using a Mac or Windows graphical user interface. There are also ways to effect expanding the ZIP archive on the command line, for example, on my Mac I can achieve the effect of those last two steps through running the command `unzip ~/Downloads/docker-introduction-gh-pages.zip`. - - -:::::::::::::::::::::::::::::::::::::::::::::::::: - -In your shell window, if you `cd` into the `docker-introduction-gh-pages` folder and list the files, you should see something similar to what I see: - -```bash -$ cd docker-introduction-gh-pages -$ ls -``` - -```output -AUTHORS _episodes code -CITATION _episodes_rmd data -CODE_OF_CONDUCT.md _extras fig -CONTRIBUTING.md _includes files -LICENSE.md _layouts index.md -Makefile aio.md reference.md -README.md assets setup.md -_config.yml bin -``` - -You can now request that a container is created that will compile the files in this set into the lesson website, and will run a simple webserver to allow you to view your version of the website locally. Note that this command will be long and fiddly to type, so you probably want to copy-and-paste it into your shell window. This command will continue to (re-)generate and serve up your version of the lesson website, so you will not get your shell prompt back until you type control\+c. This will stop the webserver, since it cleans away the container. - -For macOS, Linux and PowerShell: - -```bash -$ docker container run --rm -it --mount type=bind,source=${PWD},target=/srv/jekyll -p 127.0.0.1:4000:4000 jekyll/jekyll:3 jekyll serve -``` - -When I ran the macOS command, the output was as follows: - -```output -Unable to find image 'jekyll/jekyll:3' locally -3: Pulling from jekyll/jekyll -9d48c3bd43c5: Pull complete -9ce9598067e7: Pull complete -278f4c997324: Pull complete -bfca09e5fd9a: Pull complete -2612f15b9d22: Pull complete -322c093d5418: Pull complete -Digest: sha256:9521c8aae4739fcbc7137ead19f91841b833d671542f13e91ca40280e88d6e34 -Status: Downloaded newer image for jekyll/jekyll:3 - -...output trimmed... - -ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux-musl] -Configuration file: /srv/jekyll/_config.yml -To use retry middleware with Faraday v2.0+, install `faraday-retry` gem - Source: /srv/jekyll - Destination: /srv/jekyll/_site - Incremental build: disabled. Enable with --incremental - Generating... - Remote Theme: Using theme carpentries/carpentries-theme - done in 7.007 seconds. - Auto-regeneration: enabled for '/srv/jekyll' - Server address: http://0.0.0.0:4000 - Server running... press ctrl-c to stop. -``` - -In the preceding output, you see Docker downloading the container image for Jekyll, which is a tool for building websites from specification files such as those used for this lesson. The line `jekyll serve` indicates a command that runs within the Docker container instance. The output below that is from the Jekyll tool itself, highlighting that the website has been built, and indicating that there is a server running. - -Open a web browser window and visit the address [http://localhost:4000/](https://localhost:4000/). You should see a site that looks very similar to that at [https://carpentries-incubator.github.io/docker-introduction/](https://carpentries-incubator.github.io/docker-introduction/). - -Using a new shell window, or using your laptop's GUI, locate the file `index.md` within the `docker-introduction-gh-pages` directory, and open it in your preferred editor program. - -Near the top of this file you should see the description starting "This session aims to introduce the use of Docker containers with the goal of using them to effect reproducible computational environments." Make a change to this message, and save the file. - -If you reload your web browser, the change that you just made should be visible. This is because the Jekyll container saw that you changed the `index.md` file, and regenerated the website. - -You can stop the Jekyll container by clicking in its terminal window and typing control\+c. - -You have now achieved using a reproducible computational environment to reproduce a lesson about reproducible computing environments. - - - - - - - - - - - - diff --git a/fig/containers-cookie-cutter.png b/fig/containers-cookie-cutter.png new file mode 100644 index 000000000..80016e910 Binary files /dev/null and b/fig/containers-cookie-cutter.png differ diff --git a/index.md b/index.md index ad0edd07e..b11b92eb3 100644 --- a/index.md +++ b/index.md @@ -48,6 +48,24 @@ If you are looking for a lesson on using Singularity containers (instead of Dock - The lessons will sometimes request that you use a text editor to create or edit files in particular directories. It is assumed that you either have an editor that you know how to use that runs within the working directory of your shell window (e.g. `nano`), or that if you use a graphical editor, that you can use it to read and write files into the working directory of your shell. +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Target audience + +This lesson on the use of Docker is intended to be relevant to a wide range of +researchers, as well as existing and prospective technical professionals. It is +intended as a beginner level course that is suitable for people who have no +experience of containers. + +We are aiming to help people who want to develop their knowledge of container +tooling to help improve reproducibility and support their research work, or +that of individuals or teams they are working with. + +We provide more detail on specific roles that might benefit from this course on +the [Learner Profiles](/profiles.html) page. + :::::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::::::: callout diff --git a/introduction.md b/introduction.md index 7f863471b..01c975991 100644 --- a/introduction.md +++ b/introduction.md @@ -1,7 +1,7 @@ --- title: Introducing Containers teaching: 20 -exercises: 0 +exercises: 5 --- ::::::::::::::::::::::::::::::::::::::: objectives @@ -64,7 +64,7 @@ to use likely depends on many, many, different other programs (including the operating system!), creating a very complex, and often fragile system. One change or missing piece may stop the whole thing from working or break something that was already running. It's no surprise that this situation is sometimes -informally termed "dependency hell". +informally termed **dependency hell**. ::::::::::::::::::::::::::::::::::::::: challenge @@ -92,7 +92,7 @@ and access to resources such as files and communications networks in a uniform m ## What is a Container? What is Docker? -[Docker][Docker] is a tool that allows you to build what are called "containers." It's +[Docker][Docker] is a tool that allows you to build what are called **containers**. It's not the only tool that can create containers, but is the one we've chosen for this workshop. But what *is* a container? @@ -103,7 +103,7 @@ called the hardware. One of these pieces is the CPU or processor; another is the amount of memory or RAM that your computer can use to store information temporarily while running programs; another is the hard drive, which can store information over the long-term. All these pieces work together to do the -"computing" of a computer, but we don't see them because they're hidden from view (usually). +computing of a computer, but we don't see them because they're hidden from view (usually). Instead, what we see is our desktop, program windows, different folders, and files. These all live in what's called the filesystem. Everything on your computer -- programs, @@ -115,23 +115,23 @@ of making a mess of your existing system by installing a bunch of additional stu You don't want to buy a whole new computer because it's too expensive. What if, instead, you could have another independent filesystem and running operating system that you could access from your main computer, and that is actually stored within this existing computer? -Or, imagine you have two tools you want to use in your groundbreaking research on cat memes: `PurrLOLing`, a tool that does AMAZINGLY well at predicting the best text for a meme based on the cat species and `WhiskerSpot`, the only tool available for identifying cat species from images. You want to send cat pictures to `WhiskerSpot`, and then send the species output to `PurrLOLing`. But there's a problem: `PurrLOLing` only works on Ubuntu and `WhiskerSpot` is only supported for OpenSUSE so you can't have them on the same system! Again, we really want another filesystem (or two) on our computer that we could use to chain together `WhiskerSpot` and `PurrLOLing` in a "pipeline"... +Or, imagine you have two tools you want to use in your groundbreaking research on cat memes: `PurrLOLing`, a tool that does AMAZINGLY well at predicting the best text for a meme based on the cat species and `WhiskerSpot`, the only tool available for identifying cat species from images. You want to send cat pictures to `WhiskerSpot`, and then send the species output to `PurrLOLing`. But there's a problem: `PurrLOLing` only works on Ubuntu and `WhiskerSpot` is only supported for OpenSUSE so you can't have them on the same system! Again, we really want another filesystem (or two) on our computer that we could use to chain together `WhiskerSpot` and `PurrLOLing` in a **computational pipeline**... Container systems, like Docker, are special programs on your computer that make it possible! -The term "container" can be usefully considered with reference to shipping +The term container can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming and error prone, with high potential for different clients' goods to become mixed up. Just like shipping containers keep things together that should stay together, software containers standardize the description and creation of a complete software system: you can drop a container into any computer with -the container software installed (the 'container host'), and it should "just work". +the container software installed (the 'container host'), and it should *just work*. ::::::::::::::::::::::::::::::::::::::::: callout ## Virtualization Containers are an example of what's called **virtualization** -- having a -second "virtual" computer running and accessible from a main or **host** +second virtual computer running and accessible from a main or **host** computer. Another example of virtualization are **virtual machines** or VMs. A virtual machine typically contains a whole copy of an operating system in addition to its own filesystem and has to get booted up in the same way @@ -153,6 +153,8 @@ can be used to create multiple copies of the same shape (or container) and is relatively unchanging, where cookies come and go. If you want a different type of container (cookie) you need a different container image (cookie cutter). +![](fig/containers-cookie-cutter.png){alt='An image comparing using a cookie cutter to the container workflow'} + ## Putting the Pieces Together Think back to some of the challenges we described at the beginning. The many layers diff --git a/learner-profiles.md b/learner-profiles.md index 434e335aa..201aaac9f 100644 --- a/learner-profiles.md +++ b/learner-profiles.md @@ -1,5 +1,99 @@ --- -title: FIXME +title: Learner profiles --- -This is a placeholder file. Please add content here. +Here we provide some example profiles of people who represent the target +audience for this lesson. These example scenarios are designed to give you an +idea of the different reasons people might want to learn Docker and the types +of roles that they might hold. + +The profiles provided here and the individuals described are fictional but they +represent the lesson developers' experiences of teaching members of the +research community about Docker and other container technologies over a period +of several years. They also incorporate feedback from instructors involved in +pilot runs of this course. + +Note that containers are applicable across a wide range of use cases within the +research and High Performance Computing communities. These profiles are not +intended to cover all areas but rather to offer some examples of the types of +roles people learning this material might hold and their reasons for learning +about containers. + +## Individual learner profiles + +***Nelson is a graduate student in microbiology.*** They have experience in running Unix shell +commands and using libraries in R for the bioinformatics workflows they have developed. +They are expanding their analysis to run on 3000 genomes in 200 samples and they have +started to use the local cluster to run their workflows. The local research computing +facilitator has advised them that Docker could be useful for running their workflows on +the cluster. They'd like to make use of existing containers that other bioinformaticians +have made so they want to learn how to use Docker. They would also be interested in +creating their own Docker images for other lab members and collaborators to re-use their +workflows. + +***Caitlin is a second year undergraduate in computer science examining Docker for the first +time.*** She has heard about Docker but does not really know what it achieves or why it is +useful. She is reasonably confident in using the Unix shell, having used it briefly in +her first year modules. She is keen to find jump-off points to learn more about technical +details and alternative technologies that are also popular, having heard that container +technologies are widely used within industry. + +***Xu, a materials science researcher, wants to package her software for release with +a paper to help ensure reproducibility.*** She has written some code that makes use of a +series of Python libraries to undertake analysis of a compound. She wants to (or is +required to) make her software available as part of the paper submission. She +understands why Docker is important in helping to ensure reproducibility but not the +process and low-level detail of preparing a container and archiving it to obtain a DOI +for inclusion with the paper submission. + +***Bronwyn is a PhD student running Python/R scripts on her local laptop/workstation.*** +She is having difficulty getting all the tools she needs to work because of conflicting +dependencies and little experience with package managers. She is also keen to reduce +the overhead of managing software so she can get on with her thesis research. She has +heard that Docker might be able to help out but is not confident to start exploring +this on her own and does not have access to any expertise in this within her local +research group. She currently wants to know how to use preexisting Docker containers +but may need to create her own containers in the future. + +***Virat is a grad student who is running an obscure bioinformatics tool (from a GitHub +repo) that depends on a number of other tools that need to be pre-installed .*** He wants to be able to +run on multiple resources and have his undergrad assistant use the same tools. Virat +has command line experience and has struggled his way through complex installations +but he has no formal CS background - he only knows to use containers because a departmental +IT person suggested it. He is usually working from a Windows computer. He needs to +understand how to create his own container, use it locally, and train his student +to use it as well. + +## Group profiles + +In addition to our individual learner profiles above, we also look at three +more general groups who may want to learn about containers. This is intended to +help you get a perspective of the different types of skills and expertise that +learners engaging with this material may have: + +- **Researchers:** For researchers, even those based in non-computational domains, software + is an increasingly important element of their day-to-day work. Whether they are writing + code or installing, configuring and/or running software to support their research, they + will eventually need to deal with the complexities of running software on different + platforms, handling complex software dependencies and potentially submitting their code and data to + repositories to support the reproduction of research outputs by other researchers, or to + meet the requirements of publishers or funders. Software container technologies are valuable + to help researchers address these challenges. + +- **RSEs:** RSEs -- Research Software Engineers -- provide software development, training + and technical guidance to support the development of reliable, maintainable, sustainable + research software. They will generally have extensive technical skills but they may not + have experience of working with or managing software containers. In addition to working with + researchers to help build and package software, they are likely to be interested in how + containers can help to support best practices for the development of research software + and aspects such as software deployment. + +- **Systems professionals:** Systems professionals represent the more technical end of + our spectrum of learners. They may be based within a central IT services environment + within a research institution or within individual departments or research groups. + Their work is likely to encompass supporting researchers with effective use of + infrastructure and they are likely to need to know about managing and orchestrating + multiple containers in more complex environments. For example, they may need to provide + database servers, web application servers and other services that can be deployed + in containerized environments to support more straightforward management, maintenance + and upgradeability. diff --git a/md5sum.txt b/md5sum.txt index 8d389668a..a6d5da2f4 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -1,26 +1,25 @@ "file" "checksum" "built" "date" -"_includes/links.md" "00995287cb95631827a4f30cbe5a7722" "site/built/links.md" "2024-06-18" "CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-06-18" "LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-06-18" "aio.md" "bbb0f59db3ef6dccf60fb4a7a86d3020" "site/built/aio.md" "2024-06-18" -"config.yaml" "28f89e9f394f5402f63d265a525fe80d" "site/built/config.yaml" "2024-06-18" -"index.md" "a0429c3e89b940ea353978606add384b" "site/built/index.md" "2024-06-18" -"episodes/introduction.md" "65acbc9eee4951ed8a2bf60b8e5614c5" "site/built/introduction.md" "2024-06-18" -"episodes/meet-docker.md" "785f6d0573883fc559577e1b4a1ad910" "site/built/meet-docker.md" "2024-06-18" +"config.yaml" "54be1fabc599404a592c83552a49916f" "site/built/config.yaml" "2024-09-12" +"index.md" "16a0cc69e6e31090b65bec6484cdf513" "site/built/index.md" "2024-09-12" +"links.md" "00995287cb95631827a4f30cbe5a7722" "site/built/links.md" "2024-09-12" +"episodes/introduction.md" "fbd6c719d897bfa342d976928b942d56" "site/built/introduction.md" "2024-09-12" +"episodes/meet-docker.md" "36a6daa2e4727a8ce88db8a4a1a0fa88" "site/built/meet-docker.md" "2024-09-12" "episodes/running-containers.md" "4bd40434e9fee516256b848e2a423f5a" "site/built/running-containers.md" "2024-06-18" "episodes/managing-containers.md" "cd974b695f6fa04b3042765a827df552" "site/built/managing-containers.md" "2024-06-18" "episodes/docker-hub.md" "430220bbc73531857a09eddfc6247b4c" "site/built/docker-hub.md" "2024-06-18" -"episodes/creating-container-images.md" "0731ed5b7e57d0608ed77bc4bdc825a0" "site/built/creating-container-images.md" "2024-06-18" -"episodes/advanced-containers.md" "41a647e7e273a1eac95ee665683dd6cf" "site/built/advanced-containers.md" "2024-06-18" -"episodes/docker-image-examples.md" "91c853fa861b1f01c05088081d4679ba" "site/built/docker-image-examples.md" "2024-06-18" -"episodes/reproduciblity.md" "2e50d4da932a7934c1ed2a4180c118ef" "site/built/reproduciblity.md" "2024-06-18" +"episodes/creating-container-images.md" "1c4f5343cd4e6e32f49c7105b879cd46" "site/built/creating-container-images.md" "2024-09-12" +"episodes/advanced-containers.md" "a7bce20bf3222a7ac60363800526990d" "site/built/advanced-containers.md" "2024-09-12" +"episodes/docker-image-examples.md" "caddfa3f2785fee60367ae05d100920a" "site/built/docker-image-examples.md" "2024-09-12" +"episodes/reproduciblity.md" "55087b4f3997a95e2a5c5d6f9fd8cb7a" "site/built/reproduciblity.md" "2024-09-12" "instructors/06-containers-on-the-cloud.md" "6838e441f1869570ec5313bc72e85eb4" "site/built/06-containers-on-the-cloud.md" "2024-06-18" "instructors/08-orchestration.md" "6f69af23a2cd48c8382e2573ec2907ad" "site/built/08-orchestration.md" "2024-06-18" "instructors/about.md" "1df29c85850c4e3a718d5fc3a361e846" "site/built/about.md" "2024-06-18" "instructors/e01-github-actions.md" "ae95c2390c400410b5708a9e5f4c29c1" "site/built/e01-github-actions.md" "2024-06-18" -"instructors/e02-jekyll-lesson-example.md" "48bdfb3a1c6ce3fcb275d3027d0ceb38" "site/built/e02-jekyll-lesson-example.md" "2024-06-18" "instructors/instructor-notes.md" "6ccb557863cff40a02727a9b8729add7" "site/built/instructor-notes.md" "2024-06-18" "learners/discuss.md" "2758e2e5abd231d82d25c6453d8abbc6" "site/built/discuss.md" "2024-06-18" -"learners/reference.md" "2fed58c99f3f041d5971d3dfaa8454da" "site/built/reference.md" "2024-06-18" +"learners/reference.md" "bbb68ff9187bcebed81d18156df503cc" "site/built/reference.md" "2024-09-12" "learners/setup.md" "fd74bc2dd9538bf486391304cb6f6f7f" "site/built/setup.md" "2024-06-18" -"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-06-18" +"profiles/learner-profiles.md" "6fcb80ab2baf4f2762193ae4a6f1294a" "site/built/learner-profiles.md" "2024-09-12" diff --git a/meet-docker.md b/meet-docker.md index df5419ac1..a63829b78 100644 --- a/meet-docker.md +++ b/meet-docker.md @@ -1,7 +1,7 @@ --- title: Introducing the Docker Command Line teaching: 10 -exercises: 0 +exercises: 5 --- ::::::::::::::::::::::::::::::::::::::: objectives diff --git a/reference.md b/reference.md index 925f6631d..34f383f12 100644 --- a/reference.md +++ b/reference.md @@ -9,6 +9,8 @@ title: 'Glossary'
See the Carpentries Glossario entry
Command-line interface (CLI)
See the Carpentries Glossario entry
+
Computational pipeline
+
A combination of different software tools in a particular order that is used to perform a defined set of repeatable operations on different input data.
Container
A particular instance of a lightweight virtual machine derived from a container image. Containers are typically transient, unlike container images which persist.
Container image
diff --git a/reproduciblity.md b/reproduciblity.md index 0426b204e..8a4a90afe 100644 --- a/reproduciblity.md +++ b/reproduciblity.md @@ -1,7 +1,7 @@ --- title: 'Containers in Research Workflows: Reproducibility and Granularity' teaching: 20 -exercises: 0 +exercises: 5 --- ::::::::::::::::::::::::::::::::::::::: objectives @@ -35,23 +35,33 @@ Note that reproducibility aspects of software and containers are an active area By *reproducibility* here we mean the ability of someone else (or your future self) being able to reproduce what you did computationally at a particular time (be this in research, analysis or something else) -as closely as possible even if they do not have access to exactly the same hardware resources +as closely as possible, even if they do not have access to exactly the same hardware resources that you had when you did the original work. +What makes this especially important? With research being increasingly digital +in nature, more and more of our research outputs are a result of the use of +software and data processing or analysis. With complex software stacks or +groups of dependencies often being required to run research software, we need +approaches to ensure that we can make it as easy as possible to recreate an +environment in which a given research process was undertaken. There many +reasons why this matters, one example being someone wanting to reproduce +the results of a publication in order to verify them and then build on that +research. + Some examples of why containers are an attractive technology to help with reproducibility include: -- The same computational work can be run across multiple different technologies seamlessly (e.g. Windows, macOS, Linux). +- The same computational work can be run seamlessly on different operating systems (e.g. Windows, macOS, Linux). - You can save the exact process that you used for your computational work (rather than relying on potentially incomplete notes). - You can save the exact versions of software and their dependencies in the container image. -- You can access legacy versions of software and underlying dependencies which may not be generally available any more. +- You can provide access to legacy versions of software and underlying dependencies which may not be generally available any more. - Depending on their size, you can also potentially store a copy of key data within the container image. -- You can archive and share the container image as well as associating a persistent identifier with a container image to allow other researchers to reproduce and build on your work. +- You can archive and share a container image as well as associating a persistent identifier with it, to allow other researchers to reproduce and build on your work. ## Sharing images As we have already seen, the Docker Hub provides a platform for sharing container images publicly. Once you have uploaded a container image, you can point people to its public location and they can download and build upon it. -This is fine for working collaboratively with container images on a day-to-day basis but the Docker Hub is not a good option for long time archive of container images in support of research and publications as: +This is fine for working collaboratively with container images on a day-to-day basis but the Docker Hub is not a good option for long-term archiving of container images in support of research and publications as: - free accounts have a limit on how long a container image will be hosted if it is not updated - it does not support adding persistent identifiers to container images @@ -87,7 +97,24 @@ Note that Zenodo is not the only option for archiving and generating persistent - Make use of container images to capture the computational environment required for your work. - Decide on the appropriate granularity for the container images you will use for your computational work -- this will be different for each project/area. Take note of accepted practice from contemporary work in the same area. What are the right building blocks for individual container images in your work? - Document what you have done and why -- this can be put in comments in the `Dockerfile` and the use of the container image described in associated documentation and/or publications. Make sure that references are made in both directions so that the container image and the documentation are appropriately linked. -- When you publish work (in whatever way) use an archiving and DOI service such as Zenodo to make sure your container image is captured as it was used for the work and that is obtains a persistent DOI to allow it to be cited and referenced properly. +- When you publish work (in whatever way) use an archiving and DOI service such + as Zenodo to make sure your container image is captured as it was used for + the work and that it is assigned a persistent DOI to allow it to be cited and + referenced properly. +- Make use of tags when naming your container images, this ensures that if you + update the image in future, previous versions can be retained within a + container repository to be easily accessed, if this is required. +- A built and archived container image can ensure a persistently bundled set of + software and dependecies. However, a `Dockerfile` provides a lightweight + means of storing a container definition that can be used to re-create a + container image at a later time. If you're taking this approach, ensure that + you specify software package and dependency versions within your `Dockerfile` + rather than just specifying package names which will generally install the + most up-to-date version of a package. This may be incompatible with other + elements of your software stack. Also note that storing only a `Dockerfile` + presents reproducibility challenges because required versions of packages may + not be available indefinitely, potentially meaning that you're unable to + reproduce the required environment and, hence, the research results. ## Container Granularity