README
for internal use.
____ __ __
/ __ \____ _ _____ ________ ____/ / / /_ __ __
/ /_/ / __ \ | /| / / _ \/ ___/ _ \/ __ / / __ \/ / / /
/ ____/ /_/ / |/ |/ / __/ / / __/ /_/ / / /_/ / /_/ /
/_/ \____/|__/|__/\___/_/ \___/\__,_/ /_.___/\__, /
__________ ____ __ ______ _ _______ /____/__
/ ____/ __ \/ __ \/ / / / __ \ / | / / _/ | / / ____/
/ / __/ /_/ / / / / / / / /_/ / / |/ // // |/ / __/
/ /_/ / _, _/ /_/ / /_/ / ____/ / /| // // /| / /___
\____/_/ |_|\____/\____/_/ /_/ |_/___/_/ |_/_____/
-
Open a terminal and enter the CV gate machine:
When prompted, enter the longer password.
-
When inside, access the VM of your choice – usually the
master
– withssh -i private/gbordin.pem [email protected]
Use the shorter password this time. If for some reason you want to access the slaves right away, just substitute their IP address on the right of the @. The command prompt should change to
bordin@mapd-b-2023-gr04-1:~$
or
gr04-2
/gr04-3
if you accessed the slaves. -
Start the cluster by running the script
start_cluster
. -
Enter the command
sparkup
: it should open a jupyter-notebook session on the 9000 port of the VM. -
Open a new terminal and re-enter the cloud by skipping right to the VM, defining also the tunnelling ports:
ssh -J [email protected] \ -L 8008:localhost:9000 \ -L 1234:localhost:8080 \ -L 4321:localhost:4040 \ [email protected]
Substitute
8008
,1234
and4321
with the ports of your choice.8008
is for the Jupyter notebook,1234
for the Spark master page and4321
for the workers’ dashboard. -
Open
localhost:8008
(1234
,4321
, or the ports that you’ve chosen) on your browser and you should be able to access the Jupyter session and the Spark stuff. -
When you finish your work, remember to stop the cluster by running the
stop_cluster
script from themaster
VM.
You can jump between the VMs when you’re inside with ssh master
, ssh slave01
or ssh slave02
. This is useful when you have to install new
Python packages, because you have to do it on all three machines.