-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy pathtuesday_02.html
606 lines (367 loc) · 16.5 KB
/
tuesday_02.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
<!DOCTYPE html>
<html>
<head>
<title>Digital Summer School 2024: TUE02</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" href="../style.css"> </head>
<body>
<textarea id="source">
class: center, middle, titlepage
### TUE02: *Intro to Git, Python.*
---
class: contentpage
### **Agenda**
1. IDEs
1.1 Jupyter
2. Git
3. GitHub
4. Python
4.1 Why Python?
4.2 How Python?
4.3 Types of Python
4.4 Basics: Print
4.5 Basics: Strings
4.6 Basics: Variables
4.7 Basics: Types
4.8 Basics: Lists
4.9 Basics: For Loops
4.10 Basics: Libraries
4.11 Basics: If statements
4.12 Basics: Subprocess
4.13 Basics: Exceptions
5. Practical 🏋️
---
class: contentpage
### **1. IDEs**
Before we start writing any Python scripts, we need to set up a DEVELOPER ENVIRONMENT!!
---
class: contentpage
### **1. IDEs**
An IDE is an environment in which we write (and sometimes can also run!) code. We **could** write a script using a basic text editor (Notepad!), but it is not much fun, and we will make loads of mistakes.
IDEs generally come with tools to help us code better.
The most popular options for Python are Visual Studio Code, PyCharm, Sublime Text... there are heaps and heaps of these.
If you are looking for one to start with, I would recommend Visual Studio Code. It is easy to install and has a good balance of tools and extensibility.
https://code.visualstudio.com/
If you are concerned about telemetry, there is a fork called VS Codium which has removed this component.
https://vscodium.com/
---
class: contentpage
### **1.1 Jupyter**
When looking at coding in Python, you might come across mentions of Jupyter and Jupyter notebooks. A Jupyter notebook is a document that comprises various blocks of text and code that can be run in any order, or executed directly.
Jupyter is open source, so there are plenty of platforms that have "borrowed" the same concept: Google Colaboratory, Netflix Nteract, Microsoft Fabric notebooks. Visual Studio Code can also run Jupyter notebooks.
The ability to integrate explanatory text, graphics, code and even interactive widgets is a reason why it is quite popular in science and academic contexts.
Because our focus of this course is more on functional and operational processes, we won't really be looking at Jupyter any further, but keep it in mind if you want to do data analysis, fancy reports or just have a more interactive experience with Python.
---
class: contentpage
### **2. Git**
What is Git?
---
class: contentpage
### **2. Git**
Git is one of these things which is completely ubiquitous in its domain, like Photoshop or Google.
Git is a software management command line application written by Linus Torvalds, who was also responsible for creating Linux.
It is primarily a code version-management system, which can be used to keep track of our code, allow management of different variants as they are developed, and collaboration between multiple programmers.
Git itself is a command line application, but there are many options for graphical interfaces. It is also built-in to most modern IDEs (including Visual Studio Code).
---
class: contentpage
### **2. Git**
Turn your FOLDER into a REPOSITORY. WOW!
```sh
git init
```
There is now a hidden folder called `.git` that will handle version management.
---
class: contentpage
### **2. Git**
We can update the file changes in the main folder space using `git add`
```sh
git add *
```
Git commit lets us label this snapshot, a good opportunity to provide some context to what work was undertaken.
```sh
git commit -m 'starting here'
```
Do a few of these, and then we can see the logs!
```sh
git log
```
Knowing a bit about working with the Git CLI is not a bad idea, especially if you need to resolve a complex version/merge problem, but generally speaking you could mostly get by with a graphical IDE integration and GitHub.
```sh
git --help
```
---
class: contentpage
### **3. GitHub**
GitHub is to Git what Facebook is to Faces.
GitHub is a website that allows developers to host their software projects using Git, with a lot of other useful features to help with keeping track of projects.
In the previous example, we were just storing work on our own computers; GitHub lets us host it online, with benefits for collaboration between different coders.
It has been owned by Microsoft since it was bought in 2018. GitLab is also an equivalent platform that has some superior features, but GitHub still has much more traffic.
---
class: contentpage
### **3. GitHub**
Practical:
- Login to GitHub
- Make a "repository"
- Add a file
- Clone to your computer
---
class: tangentpage
### **Tangent: GitHub Authentication via SSH**
Once we start working with GitHub we will want a means of authenticating our computer so that we can read and write code, including private repositories.
A popular method of achieveing this is via SSH. This means creating a secret key on our computer, we share a "public" component with GitHub and then can authenticate that our computer is authorised to read and write our repositoreis.
We can create a new SSH key (if none exist) with the following command, and step through the prompts
```sh
ssh-keygen
```
We then copy the pub (public) part of the key to GitHub under `Settings` -> `SSH and GPG keys` -> `New SSH key`
We should now have the ability to clone and commit to our GitHub repos.
---
class: tangentpage
### **Tangent: Set GitHub Authorship**
We can commit code, but we want our changes to be attributed to us. This can be set globally on your computer using the following commands.
```sh
git config --global user.email "[email protected]"
git config --global user.name "paulduchesne"
```
---
class: contentpage
### **3. GitHub**
Now you can "fork" the summer-school repository.
https://github.com/paulduchesne/summer-school-2024
Forking is when you copy all of my code into your own space (like "copy" "paste"). You are now free to modify it as much as you like.
If you do something cool and want to merge it back in to the original project, you can make a "pull request", which we will discuss a bit later.
---
class: tangentpage
### **Setup**
- Fork repository from https://github.com/paulduchesne/summer-school-2024.
- Clone it locally to your computer.
- Copy media folder from Desktop.
---
class: contentpage
### **4. Python**
Python is a general purpose programming language from the 1980s, which is amongst one of the most popular in the world. It can be used to create all sorts of interesting things across the disciplines of data science, web development, machine learning... and digital archiving.
---
class: contentpage
### **4.1 Why Python?**
While the command-line can be handy for simple processes, we will likely want to switch to a programming language to create more complex systems.
---
class: contentpage
### **4.2 How Python?**
The act of writing a Python script and executing the script are distinct. A Python script can be written in any text editor or IDE, but we will require the Python Interpreter for it to do anything.
In practise, this will look like:
Use a text editor/IDE to write your script.
```python
hello = "hi"
```
save as -> `myscript.py`
In the command line, use the Python Interpreter to execute the code
```sh
python myscript.py
```
Depending on your install, you may need to specifically call `python3`.
---
class: contentpage
### **4.3 Types of Python**
Like most software, there are many different versions of Python, with version 3.12 being the current version.
There was a significant change in code between Python2 and Python3, which required a lot of script rewriting.
Be careful if you are on macOS, as Python2.7 is still installed by default, so just calling `python my_script.py` might have unexpected results.
Good idea to check your version of Python with
```sh
python --version
```
or
```sh
python3 --version
```
On Unix systems (macOS and Linux) you can also see exactly which Python interpreter is running with `which python`.
---
class: contentpage
### **4.3 Types of Python**
Anaconda is a distribution of Python which comes with its own package management system for external libraries.
It seems to be most popular with data science and machine learning projects, where managing different versions of different libraries can become tricky. I wouldn't really recommend using it unless this becomes a problem for you.
---
class: contentpage
### **4.4 Basics: Print**
The simplest Python action is to print something:
```python
print('greeting')
```
This is the Python equivalent of `echo` which we saw in the last session.
---
class: contentpage
### **4.5 Basics: Strings**
A "string" in Python is a line of text (emojis allowed!) which Python identifies by using quotations marks (double or single are fine at this stage).
```python
print('greeting')
```
Note that if we do not use quotation marks, Python does not know that this is a line of text, and does not know what to do with it.
```python
print(greeting)
```
!!! ERROR !!!
The code here is actually 'legal,' but we have inadvertently created what is called a 'variable' which we have not defined.
Python doesn't know what `greeting` (without quotations) means.
---
class: contentpage
### **4.6 Basics: Variables**
Text-without-quotations is what is known as a "variable", and is a placeholder for some other value.
```python
greeting = 'hello'
print(greeting)
```
A variable must be a combination of letters, numbers and underscores, no spaces, and no "reserved" Python keywords (eg "print").
This is very useful when we want to take a value and modify it.
```python
a_number = 1986
print(a_number)
a_number = a_number/10
print(a_number)
a_number = a_number+16
print(a_number)
a_number = a_number+a_number
print(a_number)
```
or when we have an unknown values we are working with (eg, perform an action on all files in a folder).
---
class: contentpage
### **4.7 Basics: Types**
In Python, there is the concept of different data types, of which we have seen two: *strings* and *numbers*.
These can be identified by using the type() method.
```python
print(type('hello'))
```
Variables inherit type. Note Python is a "dynamic typed" language, we don't need to explicitly instruct on type, it figures it out for us.
This means we can write code a lot faster, but it can cause problems if we think a variable is a certain type, and we get it wrong. As an example, `100` (an integer), `'100'` (a string) are different things.
```python
greeting = 'hello'
print(type(greeting))
```
Different types will behave differently, and throw an error if a logical result is not possible.
```python
a_number = 100
print(a_number+a_number, a_number / 2)
a_string = '100'
print(a_string+a_string, a_string / 2)
```
---
class: contentpage
### **4.8 Basics: Lists**
A very important concept in Python is a list or an "array".
```python
some_people = ['paul', 'david', 'bethany', 'shana']
```
Lists can contain a mix of types.
```python
mixed_types = ['hello', 27, 3.1415]
```
---
class: contentpage
### **4.9 Basics: For Loops**
Another key concept is the "for loop", which lets us work through a list, and apply an action to each list elements. This is exactly the same concept we saw earlier during the command line session.
```python
for a_number in [17, 27, 37, 47]:
print(a_number)
```
Worth noting at this point, spaces in Python matter. We need to indent the `print(a_number)` element with two or four spaces, so that the Python Interpreter knows what is going on!
```python
for a_number in [17, 27, 37, 47]:
print(a_number + 4)
```
We can use this pattern to start to perform interesting file operations, but first we will need to find a library to help us work with file paths!
---
class: contentpage
### **4.10 Basics: Libraries**
Libraries are just chunks of Python code which other people wrote, and save us having to build everything from scratch.
Many useful libraries come pre-included with Python by default, but you can install others to do highly specific, or write your own!
For now, we are going to use a very useful library called Pathlib.
```python
import pathlib
for filename in ['0086400.dpx', 'not_a_file.dpx']:
print(filename, type(filename))
pathlib_filename = pathlib.Path.cwd() / 'media' / 'film_scan' / filename
print(pathlib_filename, type(pathlib_filename))
print(pathlib_filename.exists())
```
Note the `.exists()` method at the end of the `pathlib_filename`?
This is a "method" to check if the file exists at that location or not, and returns True or False. Different types have different methods (ie things you can do with them). This is also where documentation becomes very useful, to know what methods are available to use.
https://docs.python.org/3/library/pathlib.html
---
class: contentpage
### **4.11 Basics: If statements**
`If statements` let us route operations depending on an assessment on whether something is true or not.
```python
a_number = 37
if a_number < 10:
print('the number is less than ten')
elif a_number > 100:
print('the number is greater than one hundred')
elif a_number == 900:
print('the number is nine hundred')
else:
print('what is this number?')
```
Typical symbols here are '<' something is less than something else, '>' greater than, '==' equals, and '!=' not equals.
Worth paying attention to the difference between '=' which means "this thing is this thing" (defining a variable, an assertion), '==' means "does this thing equal this thing?" (assess the truth of a statement, a question).
Also, you can use these checks outside of if statements.
```python
print(900 == 700)
```
---
class: contentpage
### **4.11 Basics: If statements**
We can also use "if statements" inside "for loops", or "for loops" inside "if statements", and on and on. This is where we can introduce complex logic (also known as "decision trees").
```python
import pathlib
for filename in ['0086400.dpx', 'not_a_file.dpx']:
pathlib_filename = pathlib.Path.cwd() / 'media' / 'film_scan' / filename
if pathlib_filename.exists():
print(pathlib_filename, 'THE FILE EXISTS, HOORAY')
else:
print(pathlib_filename, 'WHERE IS THAT FILE?')
print('done.')
```
---
class: contentpage
### **4.12 Basics: Subprocess**
I wanted to introduce the `subprocess` library. Subprocess is cool because we can call command line applications from within Python!
This can be useful if you want to orchestrate tools sequentially (extract video metadata with MediaInfo, transcode using FFmpeg, make a checksum).
```python
import subprocess
subprocess.call(['ffmpeg', '-version'])
```
A good part of this is that we can have processes defined which work inside or outside of Python (e.g. a specific FFmpeg command to convert video files).
Bad news is that this can make our scripts OS dependant (e.g. "copy a file" CLI command is different on macOS or Windows). One of the strengths of Python is that normally it can run the same scripts OS-agnostic.
You will also notice `subprocess` needs the command expressed as an array of strings.
---
class: contentpage
### **4.13 Basics: Exceptions**
A very important feature of your code is what is called "exception handling". This is where something unexpected or undesired happens, and you want to stop your script and say, "SOMETHING HAS GONE WRONG".
```python
import pathlib
for filename in ['0086400.dpx', 'not_a_file.dpx']:
pathlib_filename = pathlib.Path.cwd() / 'media' / 'film_scan' / filename
if pathlib_filename.exists():
print(pathlib_filename, 'THE FILE EXISTS, HOORAY')
else:
raise Exception('WHERE IS FILE', pathlib_filename)
print('done.')
```
Note that if we run this and the file is not found, it does not progress past the raised exception. We never get to the "done" print statement.
---
class: contentpage
### **5. Practical**
We are going to:
- Log into GitHub.
- Fork the `summer-school-2024` repo, so we each have our own copy.
- Clone it to your computer to work on locally.
- Write a script in your favourite IDE to check filesizes on our scan files, and raise an exception if a certain condition is not met.
- Use Git to commit changes back to GitHub (CLI and IDE).
- Quick tour of GitHub issues, version control and branches.
---
class: contentpage, middle
This has been an introduction to Python and Git!
</textarea>
<script src="https://remarkjs.com/downloads/remark-latest.min.js" type="text/javascript"></script>
<script type="text/javascript">var slideshow = remark.create({ratio: "16:9"});</script>
</body>
</html>