-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path02_python_environments.qmd
754 lines (528 loc) · 20.7 KB
/
02_python_environments.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
---
title: "Python Virtual Environments and Package Management"
---
# Introduction
The next useful topic that you will need to know is how to manage Python
virtual environments and manage package dependencies.
::: {.callout-important}
We also include an [exercise](#exercise) at the end of this document. Read
through and complete the exercise.
:::
**We suggest that you follow along in a command shell on your local machine.**
We're going to build upon our [Command Shells](01_command_shells.qmd) skills
to install Python and manage virtual environments.
We also assume that you have a basic understanding of Python. If you need a
tutorial, check out the [Python Tutorial](https://docs.python.org/3/tutorial/index.html).
The great thing about Python is that it is a very popular language for scientific
computing and so there are a lot of great libraries and tools that you can use.
This can also be a challenge because every python project will have different
package dependencies, and even dependencies on specific versions of packages.
The solution to this problem is to use virtual environments.
There are basically two categories of environments and package managers:
1. `conda`
2. `venv` and `pip`
## Conda
First a little clarification is in order since you'll hear the terms "conda",
"miniconda", and "anaconda" thrown around.
[Conda](https://docs.conda.io/en/latest/) is an open source package, dependency,
and environment management system that can manage packages for multiple languages
but is primarily used for Python.
_Anaconda_ is a distribution of conda that includes _a lot_ of packages bundled
together. It is maintained by [Anaconda, Inc](https://www.anaconda.com/). We
don't recommend using it because it is a large download and includes a lot of
packages that you may not need.
Miniconda is a minimal installation of conda, maintained by [Anaconda, Inc](https://www.anaconda.com/).
With miniconda you can install only the packages you need.
Anaconda maintians a [repository](https://anaconda.org/) of packages that you
can install with the `conda install` command. It sometimes happens that a package
is not available in the conda repository, but is available in the PyPI repository
which we discuss below.
To recap, conda is a tool for both package and environment management and we
recommend using miniconda to tailor the installation to your needs.
## venv and pip
The `venv` module is a standard library that is included with Python that supports
creating python virtual environments.
`pip` is a package manager that is usually included with a Python installation.
It is the recommended way by the
[Python Packaging Authority](https://www.pypa.io/en/latest/) to install Python
packages. It installs packages from the [Python Package Index](https://pypi.org/)
(PyPI) which tends to be a more comprehensive repository of Python packages than
the conda repository.
# Python Versioning and Upgrading
## `pip` vs `python3 -m pip`
A word of caution here when you have multiple versions of Python installed.
The commands `python`, `python3` and `pip` may be associated with different
versions of python.
A good habit is to check locations and versions of these commands with:
```bash
which python
python --version
which python3
python3 --version
which pip
pip --version
```
On MacOS, I get
```bash
% which python
python not found
% which python3
/Library/Frameworks/Python.framework/Versions/3.12/bin/python3
% python3 --version
Python 3.12.4
% which pip
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip
% pip --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)
% which pip3
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip3
% pip3 --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)
```
It used to be on previous versions of MacOS it would come with Python 2.x as
default as `python`, and `pip` would be the Python 2.x package manager. Now, you
see that in the system installation, there is no `python`, and `pip` is actually an
alias to `pip3`.
From the location and version shown above, we can see that `pip`, `pip3`, and `python3`
are all associated with the same version, in this case Python 3.12.4.
> Note that in both conda and venv, both `pip` and `python` are aliases to the
> `pip3` and `python3`.
`pip` is actually just a command line wrapper around the `pip` module`, so a
good precaution is to use `python3 -m pip` to install packages.
The command `python3 -m pip` is always associated with the version of Python that
you are currently using.
## Installing and Upgrading Python
It's very likely there is a newer version of Python available than is installed
by default or was previously installed. The way you install Python depends on
your operating system and command shell.
::: {.panel-tabset}
### MacOS
There are multiple ways to install.
#### Using python.org
Probably the easiest way is to download and install the MacOS install package from
[python.org downloads](https://www.python.org/downloads/). This method allows you to install
the latest version directly from the Python Software Foundation.
::: {.callout-caution}
The installation from python.org will likely leave your old installation of Python
and install another version in a different location.
:::
#### Using Homebrew
Alternatively, you can use [Homebrew](https://brew.sh/) to install and manage Python versions:
1. Install Homebrew if you haven't already:
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
2. Install Python:
```bash
brew install python
```
3. To update Python using Homebrew:
```bash
brew update # Update Homebrew itself
brew upgrade python # Upgrade Python to the latest version
```
4. Verify the installed version:
```bash
python3 --version
```
::: {.callout-note}
Homebrew might not always have the absolute latest version available on python.org,
but it provides an easy way to manage and update Python along with other packages.
:::
### Windows Subystem for Linux (WSL)
On WSL Ubuntu, you can install Python from the package manager.
```bash
sudo apt update
sudo apt install python3
```
To see all available versions, you can use the `apt list` command.
```
apt list python3 -a
```
If there is a newer version available, you can upgrade Python by following these steps:
1. Update your package list:
```bash
sudo apt update
```
2. Check the latest version available through apt:
```bash
apt list python3 -a
```
3. If a newer version is available, upgrade Python:
```bash
sudo apt install --only-upgrade python3
```
4. Verify the new version:
```bash
python3 --version
```
5. If you need a specific newer version not available in the default repositories, you can use a PPA:
```bash
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.x # Replace x with the desired version number
```
Note: Using a PPA allows you to install versions of Python that may not be available in the standard Ubuntu repositories.
:::
There are cases where you might end up with multiple versions of python installed
simultaneously, which is fine, but just know that on both MacOS command shells and
WSL Ubuntu, the first python executable found in the search path will be the one
executed and the one shown from `which python3`.
## Conda Installation
As mentioned above, we generally prefer the miniconda distribution of `conda`.
You can find installation instructions at
[https://docs.anaconda.com/miniconda/](https://docs.anaconda.com/miniconda/).
It describes both the
[GUI](https://docs.anaconda.com/miniconda/#latest-miniconda-installer-links) and
the [command line](https://docs.anaconda.com/miniconda/#quick-command-line-install)
installation methods.
::: {.callout-tip}
Remember that for WSL Ubuntu, you'll need use the command line installation
method.
:::
## Conda Environment
Next step is to create a conda environment.
See [here](https://docs.anaconda.com/working-with-conda/environments/)
for complete instructions.
```bash
conda create -n my_test_env
conda activate my_test_env
```
To see what packages are installed in the environment, you can use the `conda list` command.
```bash
conda list
```
::: {.callout}
What do you see in the list?
:::
To deactivate the environment, you can use the `conda deactivate` command.
```bash
conda deactivate
```
To remove the environment, you can use:
```bash
conda remove -n my_test_env --all
```
Now let's create a new environment with python 3.12.4 and see what packages are installed.
```bash
conda create -n my_test_env python=3.12.4
```
Now you can activate the environment and see what packages are installed.
```bash
conda activate my_test_env
conda list
```
::: {.callout}
How many packages do you see in the list?
:::
## Installing Packages
To install a package such as `jupyter`, you can use the `conda install` command.
```bash
conda install jupyter
```
You can also install a specific version of a package or major version of a package.
```bash
conda install jupyter=1.0.0
# install the latest minor version of the major version 1
conda install jupyter=1
```
To list the available versions of packages to install from conda, you can use the `conda search` command. Here's how you can do it:
```bash
conda search package_name
```
For example, if you want to see all available versions of the `jupyter` package, you would run:
```bash
conda search jupyter
```
This will show you a list of all available versions of the package, along with information about which channel they're from and which platforms they support.
If you want to see versions for a specific channel, you can use the `--channel` or `-c` option:
```bash
conda search --channel conda-forge jupyter
```
To see more details about a package, including its dependencies, you can use the `--info` flag:
```bash
conda search --info jupyter
```
If you want to see all packages that match a certain pattern, you can use wildcards:
```bash
conda search 'python=3.8*'
```
This would show all versions of Python 3.8.x available.
Remember, the available versions may depend on your current conda configuration, including which channels you have enabled. You might want to update your conda first to ensure you're seeing the most recent information:
```bash
# make sure you are not in a conda environment
conda deactivate
conda update conda
```
These commands will help you find the specific versions of packages available for installation through conda.
## Updating Packages
To check for updates and then update packages in a specific conda environment, you can follow these steps:
1. First, activate the environment you want to check:
```bash
conda activate my_test_env
```
2. To check for updates without actually installing them:
```bash
conda update --all --dry-run
```
This command will show you what packages would be updated without actually performing the update.
3. If you're satisfied with the proposed updates, you can perform the actual update:
```bash
conda update --all
```
This will update all packages in the current environment to their latest compatible versions.
::: {.callout-tip}
You can also update specific packages by naming them:
```bash
conda update package1 package2
```
:::
::: {.callout-note}
Remember that conda will only update to versions that are compatible with other packages in your environment. Sometimes, major version updates might require manually specifying the new version or recreating the environment.
:::
By regularly checking for and applying updates, you can ensure your environment has the latest features and security patches.
## conda environment.yml
It's often a good idea to make it easy for others
to recreate your environment. You can do this by creating a `environment.yml` file.
```bash
conda env export > environment.yml
```
Then to create an environment from the `environment.yml` file, you can use the `conda env create` command.
```bash
conda env create -f environment.yml
```
Note that this will create a new environment with the same packages as the current environment and
also the same name of the environment.
You can see which environments are available with the `conda env list` command.
Here's a handy [conda cheat sheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf)
## venv and pip
### Creating a Virtual Environment
To create a virtual environment using venv, you can use the following command:
```bash
python3 -m venv my_test_env
```
This creates a new virtual environment named `my_test_env` in the current directory.
### Activating the Environment
To activate the environment:
::: {.panel-tabset}
#### MacOS/Linux
```bash
source my_test_env/bin/activate
```
#### Windows
```bash
my_test_env\Scripts\activate
```
:::
When activated, your command prompt should change to indicate the active environment.
### Deactivating the Environment
To deactivate the environment:
```bash
deactivate
```
### Installing Packages
To install a package such as `jupyter`, you can use the `pip install` command:
```bash
python -m pip install jupyter
```
You can also install a specific version of a package:
```bash
python -m pip install jupyter==1.0.0
# install the latest minor version of the major version 1
python -m pip install jupyter~=1.0.0
```
### Listing Installed Packages
To see what packages are installed in the environment:
```bash
pip list
```
### Checking for Package Updates
To check for updates to installed packages:
```bash
pip list --outdated
```
### Updating Packages
To update a specific package:
```bash
python -m pip install --upgrade package_name
```
To update all packages:
```bash
pip list --outdated | cut -d ' ' -f1 | xargs -n1 pip install -U
```
### Creating Requirements File
To create a `requirements.txt` file listing all installed packages:
```bash
pip freeze > requirements.txt
```
### Installing from Requirements File
To install packages from a `requirements.txt` file:
```bash
python -m pip install -r requirements.txt
```
### Removing the Environment
To remove the virtual environment, simply delete the environment folder:
::: {.panel-tabset}
#### MacOS/Linux
```bash
rm -rf my_test_env
```
#### Windows
```bash
rmdir /s /q my_test_env
```
:::
Remember to deactivate the environment before removing it.
::: {.callout-tip}
Always use `python -m pip` instead of just `pip` to ensure you're using the correct version of pip associated with your current Python environment.
:::
## A word on package caching
Note that `pip` and `conda` will cache packages, often in your home directory.
This could be a problem because, for exeample, many versions of the same
package can be cached and use up disk space. This can be especially problematic
on SCC where you're home directory is quite limited in size. We'll talk about
that when we cover SCC.
# Exercise
## Exercise Introduction
Let's put what we've learned into practice. For this exercise, you'll need to
create a new virtual environment and install pytorch and related packages into
the environment. You'll save a snapshot of your environmenat as well.
Then you'll need to copy the python script from below into a file in your working
directory. Run the script to train and evaluate the model. The script is a simple
CNN for classifying CIFAR-10 images. It also writes out the trained model to a file.
You'll then need to upload your trained model file to Gradescope. You'll also
need to upload your `environment.yml` or `requirements.txt` file.
## Exercise Instructions
1. Create a directory for your exercise and cd into it.
2. Create a new virtual environment with either `venv` or `conda` and then activate it.
3. Install pytorch and related packages into the environment as described [below](#install-pytorch).
4. Copy the python script from below into a file in your working directory. The
file should have extension `.py` such as `cifar10.py`.
5. Run the script to train and evaluate the model. You could for example run it
with `python cifar10.py`.
6. **Extra credit:** Improve the model and/or training regime and re-run the script.
7. Save your python environment into a `environment.yml` for conda or a `requirements.txt` for `pip`.
8. Upload the .pt model file and your environment file to Gradescope. Enter your name for the leaderboard.
::: {.callout}
For extra credit, try to improve the accuracy of the model. The student who
has the highest accuracy will receive 10 extra credit points and recognition
in class!
:::
## Install PyTorch
From the [pytorch website](https://pytorch.org/get-started/locally/), the
install instructions are:
:::: {.panel-tabset}
### MacOS
:::: {.panel-tabset}
##### Conda
```bash
conda install pytorch::pytorch torchvision -c pytorch
```
##### pip
```bash
pip install torch torchvision
```
::::
### Linux
:::: {.panel-tabset}
##### Conda
```bash
conda install pytorch torchvision cpuonly -c pytorch
```
##### pip
```bash
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu
```
::::
::::
## Python Script
```python
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
# Define transformations for the dataset
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# Download and load the CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
# Classes
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# Define the CNN model
class SmallCNN(nn.Module):
def __init__(self):
super(SmallCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(32 * 8 * 8, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 32 * 8 * 8)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
# Initialize the model, loss function, and optimizer
model = SmallCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
# Training the model
def train_model(model, trainloader, criterion, optimizer, epochs=5):
for epoch in range(epochs): # Loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# Get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backward pass and optimize
loss.backward()
optimizer.step()
# Print statistics
running_loss += loss.item()
if i % 100 == 99: # Print every 100 mini-batches
print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
running_loss = 0.0
print('Finished Training')
# Using the TorchScript method for model saving
# Important! Do not change the following 2 lines of code except for the model name
scripted_model = torch.jit.script(model)
torch.jit.save(scripted_model, 'cifar10-model.pt')
print('Model saved as cifar10-model.pt')
# Call the training function
train_model(model, trainloader, criterion, optimizer)
# Evaluation function to test the accuracy
def evaluate_model(model, testloader):
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f'Accuracy of the network on the 10,000 test images: {accuracy:.2f}%')
return accuracy
# Call the evaluation function
evaluate_model(model, testloader)
```
# Recap
## Recap
In this module, we've covered:
- Creating and managing python environments with `conda` and `venv`.
- Installing packages with `conda` and `pip`.
- Creating and managing `environment.yml` and `requirements.txt` files.
- Training and evaluating a simple CNN on the CIFAR-10 dataset.