-
Notifications
You must be signed in to change notification settings - Fork 270
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7592c14
commit a9ea34b
Showing
1 changed file
with
198 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Chaining\n", | ||
"We can **chain** transformations and aaction to create a computation **pipeline**\n", | ||
"Suppose we want to compute the sum of the squares\n", | ||
"$$ \\sum_{i=1}^n x_i^2 $$\n", | ||
"where the elements $x_i$ are stored in an RDD." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Start the `SparkContext`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"from pyspark import SparkContext\n", | ||
"sc = SparkContext(master=\"local[4]\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"7, 6, 4, 7, 9, 0, 3, 7, 6, 2, 7, 7, 1, 5, 3, 0, 3, 9, 2, 4, 4, 9, 5, 8, 9, 8, 3, 9, 3, 5, 1, 1, 9, 9, 0, 0, 5, 9, 2, 1, 1, 6, 9, 6, 3, 0, 4, 1, 3, 4, 1, 6, 3, 9, 1, 3, 7, 7, 1, 3, 3, 8, 6, 5, 5, 8, 3, 0, 6, 2, 7, 7, 7, 2, 0, 3, 7, 4, 4, 4, 7, 1, 9, 2, 7, 8, 8, 4, 7, 1, 9, 9, 9, 6, 5, 2, 7, 7, 3, 3, 0, 0, 9, 9, 7, 3, 6, 5, 1, 5, 9, 4, 8, 3, 2, 6, 7, 4, 8, 6, 8, 7, 5, 2, 5, 5, 9, 0, 6, 4, 7, 8, 2, 6, 6, 0, 7, 7, 1, 7, 6, 0, 8, 8, 1, 8, 0, 3, 9, 1, 1, 8, 4, 6, 0, 1, 4, 8, 0, 0, 6, 4, 4, 4, 8, 4, 1, 0, 0, 1, 5, 1, 1, 6, 6, 4, 7, 8, 2, 1, 6, 6, 1, 7, 9, 0, 3, 9, 3, 6, 7, 1, 5, 2, 3, 9, 1, 0, 9, 6, 3, 7, 9, 4, 2, 0, 5, 0, 8, 2, 8, 5, 8, 8, 1, 0, 3, 9, 2, 3, 6, 0, 0, 7, 6, 6, 5, 4, 2, 4, 8, 0, 4, 2, 6, 7, 8, 5, 9, 7, 0, 2, 0, 6, 1, 0, 0, 6, 3, 9, 1, 8, 7, 9, 5, 9, 0, 2, 9, 3, 7, 3, 3, 4, 8, 3, 5, 6, 5, 4, 4, 3, 9, 0, 0, 4, 1, 9, 9, 7, 9, 7, 0, 9, 8, 2, 6, 1, 4, 3, 3, 7, 8, 1, 1, 9, 1, 8, 4, 1, 3, 0, 3, 3, 1, 2, 2, 5, 9, 9, 1, 9, 4, 1, 4, 7, 8, 5, 8, 3, 9, 4, 6, 0, 5, 3, 6, 1, 6, 3, 8, 7, 3, 9, 0, 5, 9, 8, 5, 0, 7, 2, 8, 7, 6, 5, 2, 9, 3, 5, 6, 3, 4, 9, 3, 4, 3, 9, 9, 0, 5, 0, 2, 5, 7, 6, 7, 6, 7, 7, 6, 6, 0, 3, 5, 3, 7, 5, 9, 6, 3, 9, 2, 5, 1, 5, 7, 0, 5, 8, 5, 9, 0, 8, 0, 2, 5, 5, 1, 2, 9, 3, 1, 7, 2, 2, 6, 2, 9, 4, 0, 7, 9, 8, 9, 4, 2, 0, 7, 5, 5, 4, 2, 4, 8, 0, 3, 8, 0, 2, 3, 8, 5, 1, 5, 9, 4, 6, 8, 5, 8, 4, 0, 0, 0, 6, 1, 1, 8, 8, 5, 9, 0, 5, 9, 2, 8, 9, 4, 2, 4, 6, 2, 2, 6, 4, 4, 8, 1, 9, 6, 0, 7, 0, 5, 9, 1, 0, 6, 1, 6, 8, 7, 1, 8, 4, 8, 7, 1, 0, 8, 4, 8, 1, 9, 9, 1, 5, 4, 6, 7, 4, 4, 0, 1, 3, 0, 0, 1, 8, 8, 4, 5, 5, 4, 4, 8, 2, 0, 5, 1, 8, 5, 2, 0, 9, 8, 8, 7, 1, 0, 8, 6, 8, 3, 3, 6, 7, 6, 6, 6, 6, 9, 8, 6, 8, 5, 8, 6, 9, 2, 1, 0, 6, 8, 7, 6, 5, 8, 3, 3, 4, 3, 7, 9, 3, 8, 7, 8, 7, 5, 0, 3, 4, 7, 6, 5, 8, 9, 9, 5, 4, 0, 8, 4, 7, 5, 8, 7, 1, 4, 2, 6, 5, 8, 5, 7, 9, 8, 6, 0, 0, 8, 6, 1, 0, 0, 6, 4, 6, 1, 7, 7, 9, 0, 8, 7, 8, 0, 8, 8, 6, 3, 3, 8, 7, 3, 2, 1, 7, 7, 5, 7, 8, 3, 1, 7, 2, 7, 8, 5, 6, 3, 7, 8, 0, 6, 8, 7, 7, 3, 7, 0, 7, 7, 8, 5, 6, 4, 2, 0, 8, 7, 6, 3, 2, 3, 9, 4, 2, 3, 1, 1, 0, 1, 9, 3, 2, 6, 4, 8, 7, 0, 4, 2, 1, 2, 5, 4, 8, 6, 2, 2, 7, 7, 8, 6, 8, 2, 8, 9, 4, 7, 2, 1, 5, 5, 1, 9, 2, 1, 1, 4, 2, 2, 1, 6, 8, 2, 4, 2, 3, 7, 8, 9, 8, 6, 1, 1, 1, 9, 3, 6, 8, 2, 1, 2, 4, 1, 9, 4, 9, 6, 1, 0, 0, 1, 3, 8, 1, 3, 9, 8, 9, 5, 4, 9, 1, 6, 6, 3, 1, 2, 7, 5, 4, 7, 7, 4, 7, 1, 0, 0, 3, 4, 5, 4, 4, 5, 4, 1, 5, 2, 8, 1, 8, 0, 0, 2, 5, 5, 2, 2, 0, 3, 4, 3, 6, 0, 2, 6, 7, 7, 6, 7, 9, 6, 3, 6, 2, 4, 5, 0, 6, 7, 6, 4, 6, 5, 1, 1, 6, 9, 7, 6, 7, 6, 5, 2, 2, 7, 3, 7, 7, 2, 7, 1, 1, 2, 1, 9, 3, 8, 4, 5, 1, 7, 0, 3, 6, 5, 1, 2, 1, 8, 8, 0, 7, 2, 4, 6, 6, 8, 0, 5, 4, 6, 4, 2, 3, 2, 2, 8, 5, 2, 8, 2, 8, 2, 2, 4, 0, 4, 2, 9, 7, 9, 1, 2, 0, 4, 4, 9, 6, 3, 5, 3, 7, 1, 2, 6, 0, 7, 8, 7, 8, 1, 6, 9, 4, 1, 5, 5, 9, 3, 6, 9, 5, 1, 7, 3, 0, 8, 7, 5, 5, 2, 4, 2, 0, 9, 6, 0, 1, 5, 9, 2, 7, 1, 8, 3, 2, 9, 8, 6, 9, 4, 5, 0, 0, 5, 7, 0, 7, 0, 9, 1, 4, 7, 1, 7, 8, 6, 3, 8, 1, 1, 1, 7, 1, 6, 4, 3, 7, 7, 4, 1, 0, 5, 2, 1, 8, 4, 7, 2, 8, 1, 4, 6, 8, 8, 5, 2, 6, 2, 9, 7, 1, 6, 2, " | ||
] | ||
} | ||
], | ||
"source": [ | ||
"B=sc.parallelize(np.random.randint(0,10,size=1000))\n", | ||
"lst = B.collect()\n", | ||
"for i in lst: \n", | ||
" print(i,end=', ')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Sequential syntax for chaining\n", | ||
"Perform assignment after each computation" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"CPU times: user 15.2 ms, sys: 11.9 ms, total: 27.1 ms\n", | ||
"Wall time: 1.01 s\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"%%time\n", | ||
"Squares=B.map(lambda x:x*x)\n", | ||
"summation = Squares.reduce(lambda x,y:x+y)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"29395\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(summation)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Cascaded syntax for chaining\n", | ||
"Combine computations into a single cascaded command" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"CPU times: user 13.9 ms, sys: 13 ms, total: 26.9 ms\n", | ||
"Wall time: 304 ms\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"29395" | ||
] | ||
}, | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"%%time\n", | ||
"B.map(lambda x:x*x).reduce(lambda x,y:x+y)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Both syntaxes mean exactly the same thing\n", | ||
"The only difference:\n", | ||
"* In the sequential syntax the intermediate RDD has a name `Squares`\n", | ||
"* In the cascaded syntax the intermediate RDD is *anonymous*\n", | ||
"\n", | ||
"The execution is identical!\n", | ||
"\n", | ||
"### Sequential execution\n", | ||
"The standard way that the map and reduce are executed is\n", | ||
"* perform the map\n", | ||
"* store the resulting RDD in memory\n", | ||
"* perform the reduce\n", | ||
"\n", | ||
"### Disadvantages of Sequential execution\n", | ||
"\n", | ||
"1. Intermediate result (`Squares`) requires memory space.\n", | ||
"2. Two scans of memory (of `B`, then of `Squares`) - double the cache-misses.\n", | ||
"\n", | ||
"### Pipelined execution\n", | ||
"Perform the whole computation in a single pass. For each element of **`B`**\n", | ||
"1. Compute the square\n", | ||
"2. Enter the square as input to the `reduce` operation.\n", | ||
"\n", | ||
"### Advantages of Pipelined execution\n", | ||
"\n", | ||
"1. Less memory required - intermediate result is not stored.\n", | ||
"2. Faster - only one pass through the Input RDD." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sc.stop()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |