Setting up a Python Analytics Server

Suhas Somnath
Advanced Data and Workflows Group
National Center for Computational Sciences
Oak Ridge National Laboratory

10/9/2017

Table of contents:

Introduction:

Support:

Best Practices and ethical use of the cloud:

A virtual machine is like a public-use desktop or a laptop. It costs money to run VMs and reserving resources for your VM, precludes others from utilizing resources. Here are a few guidelines for using and managing VMs:

Other notes:

Configuration

Step 0: Getting a CADES Cloud account:

  1. You will need to request for access to the CADES Cloud from the following the instructions here. You should receive a mail within a few minutes to 1-2 hours regarding the approval of your request.
  2. OPTIONAL: By default, everyone has access to virtual machines that have up to 8 GB of memory and 16 CPU cores. If you need more, you can request to have your quotas increased by contacting CADES including details such as your three character UCAMS id, justification, and duration for the increase in quota in the email.
  3. OPTIONAL: Consider joining the #ornl_cloud channel on the CADES SLACK group to communicate with other users of the CADES cloud.

Step 1: Creating and Launching an instance:

You can follow the four steps in CADES’ documentation in the links below but pay attention to the following notes:

  1. Log in to Horizon, name your VM - follow the instructions on this page as is.
  2. Choose a flavor, image, and boot source - follow instructions here but pay attention to a few things:
    1. At the Source Tab:
      • Delete Volume on Instance Delete: Set to No if you want to drive to be kept alive even though the instance is deleted. This is generally a good idea - you can always delete the volume (after you delete the instance) if you don't need it.
      • Volume Size: This is the size of the storage drive that will contain the operating system, data, python packages etc. You are recommended to set this to 16 GB or larger. If you intend to use your CADES Cloud account exclusively for this analytics server, you can use up your entire quota (eg. 40 GB). Like any personal computer, you can always add volumes to your instance but starting off with a large enough volume can mitigate additional work. Please see this document if you already created an instance but need to add a new storage volume.
    2. At the Flavor Tab: This mainly determines the number of processor cores and memory. You can change the flavor after creating the instance so do not worry about this step very much. Pick the flavor that best suits your applications:
    3. Pick any flavor that begins with m1. if you do a lot of statistical analysis that requires a large RAM compared to the number of CPU cores
    4. Pick any flavor beginning with c1. if you tend to run a lot of small computations in parallel.
    5. For additional flavors request CADES to increase your quota. See Step 0.
    6. You can always run multiple machines in parallel. So you could distribute your memory / CPUs among two machines that fully utilize your quota.
  3. Set up a security group as it says in the document.
  4. Configure a key pair for accessing the VM as it says in the document.

Step 2: Accessing the Instance:

The instructions below are a simplification of the official CADES documentation:

1. Find the IP address of your machine

  1. While in the Horizon interface you used for creating the instance to your VM, Click on the Compute tab, then the Instances sub-tab
  2. Copy the IP address listed for your instance

2. Get the public SSH key

  1. Click on the Access and Security tab and then navigate to Key Pairs.
  2. Click on the key. In this case – CADESCloudKey
  3. Copy the contents of Public Key and paste into a text editor like TextEdit on MacOS or Notepad in Windows. Read the next step before saving:
  4. Before saving, make sure to change the format to plain text. This is especially true of TextEdit in Mac (in the Menu bar - Go to Format -> Make Plain Text) Wordpad (when saving, select Text Document (.txt) instead of the default Rich text in the pull down menu) in Windows for example.
  5. Save the file as id_rsa.pub

From here on follow instructions specific to your operating system:

ORNL Mac / Linux computer:

Before you begin: These instructions are for ORNL computers only. Instructions for personal computers will follow. If you are outside the ORNL network but working on an ORNL computer, you will need to connect to the ORNL VPN using your PIN and RSA token to get back into the network

1. Moving the keys:
  1. OPTIONAL but Recommended: If you are interested in accessing your instance from your personal computer, it is recommended to make a copy of your public and private keys and place the copies someplace on ORNLDATA (e.g. - My Documents).
  2. Open the Terminal application and navigate to the directory where you stored your private key.
  3. Rename your private key from the original name (for example - CADESCloudKey) by typing:
$ mv CADESCloudKey id_rsa
  1. Move the private and public keys to ~/.ssh/. For example, if you stored both the private and public keys in Documents.
$ cd Documents
$ mv id_rsa ~/.ssh/id_rsa
$ mv id_rsa.pub ~/.ssh/id_rsa.pub
2. OPTIONAL: Shortcuts!

Aliases:

You can set up aliases that make it easier to refer to your remote machine. Aliases can turn commands like: ssh cades@172.22.3.50 to something far simpler like: ssh jupyterVM.

Graphical interface for SSH:

The Mac Terminal application comes with utilities that simplify the ssh process with a graphical interface. If you are comfortable with the command line and do not mind typing ssh / sftp commands you can skip this step.

If you are interested in this quick setup, follow the instructions here. Please only follow instructions till step 6 (set up the entries and do not follow any steps including and following those that expect you to click on the Connect button. We will get to this in Step 4 below)

3. Connecting to the instance
ssh cades@172.22.3.50

ORNL Windows computer:

Before you begin: These instructions are for ORNL computers only. Instructions for personal computers will follow. If you are outside the ORNL network but working on an ORNL computer, you will need to connect to the ORNL VPN using your PIN and RSA token to get back into the network.

  1. Install PuTTY: PuTTY should be preinstalled on all ORNL Windows computers. However, if you don’t have PuTTY installed, install it via the following links:
  2. OPTIONAL but Recommended: If you are interested in accessing your instance from your personal computer, you are recommended to make a copy of your public and private keys and place the copies some place on ORNLDATA.
  3. Configure PuTTY to connect to your instance by following the instructions starting from the topic titled Connect to Your VM Instance Using PuTTY in CADES' instructions
  4. Configure the tunneling to connect to the Jupyter notebook server by following the instructions here

From your personal computer:

  1. Log in via the Citrix page
  2. PuTTY setup:
    • Select the ORNL General Desktop application
    • Follow steps 2-4 in the instructions laid out for ORNL Windows computers above.
  3. You can access your VM through at least two routes:
    • Recommended: In the Citrix menu, select the PuTTY application and use it as you would use an ORNL Windows computer.
    • In Citrix, select the ORNL General Desktop application and use the PuTTY application to access your VM. This may be slow (bandwidth wasted on transporting the bits of the Windows virtual machine) and tedious (you cannot forward the Jupyter notebook server to your personal computer - it would stay within the Windows virtual machine). This option is preferable in the event that you want to upload data / code from your ORNLDATA to your VM.

Step 3: Installing analytics packages on the instance:

  1. Download Anaconda 5.2 -> python 3.6. You can download a different version if you wish.
$ mkdir temp
$ curl https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-x86_64.sh > temp/Anaconda3-5.2.0-Linux-x86_64.sh

  1. Change privileges before installing Anaconda
$ chmod +x temp/Anaconda3-5.2.0-Linux-x86_64.sh
  1. Install Anaconda:
  2. Start the installer
$ bash temp/Anaconda3-5.2.0-Linux-x86_64.sh
$ rm -r temp
  1. Switch to anaconda environment:
$ source ~/.bashrc
  1. Install missing packages for wholesome Jupyter functionality:

    • Enable ability to export to pdf in Jupyter: $ conda install -c anaconda-nb-extensions nbbrowserpdf
    • Enable javascript for interactive elements in Jupyter: $ jupyter nbextension enable --py --sys-prefix widgetsnbextension
  2. OPTIONAL: To simplify the command to start up the Jupyter notebook:

    1. First create the configuration file:
$ jupyter notebook --generate-config
2. Open up the notebook:
$ nano ~/.jupyter/jupyter_notebook_config.py
3. Use the key combination `Ctrl`+`W` to search for `.open_browswer`
4. Uncomment the line
5. Set the flag to `False`
6. Search for `NotebookApp.port = 8888` using `Ctrl`+`W`
7. Uncomment the line
8. Set the `port` number to `8889` (or any number > 1024 for that matter)
9. Close the editor with `Ctrl`+`X`
10. Save the file
  1. OPTIONAL - You can always install any python packages from this point on. You could install deep-learning frameworks like Keras or TensorFlow but you are recommended to use optimized Docker containers for this. Please refer to this separate tutorial for this.

Running

Step 1: Starting a Jupyter server:

  1. Ensure that you don’t leave room for accidental damage to the rest of the VM (such as the anaconda folder etc.) by starting the Jupyter notebook in a new / separate folder. Perhaps this folder contains data + notebooks, etc. For now, we will make an empty folder and start the notebook from there:
$ mkdir workspace
$ cd workspace
  1. OPTIONAL:Persistent Jupyter server: As it stands, if you close this ssh session, your command or operation (for example, a running jupyer server) will be aborted as well. In order to keep the jupyter server easily accessible, we will need to either use the screen or the tmux commands. We will be using screen here. Note that this approach does not keep your ssh connection to the Jupyter server (discussed in the next step) alive if your local computer goes to sleep or is shut down. IF you need your computation / analysis to run even after you shut down your local machine, you are recommend to run your analysis as a script on the remote machine instead of using Jupyter notebooks. If you decide to use screen, type the following command BEFORE you initiate the Jupyter server:
$ screen
  1. Starting the Jupyter server:
  2. If you modified the configuration file that was optional in the previous step:
$ jupyter notebook
$ jupyter notebook --no-browser --port=8889
  1. OPTIONAL: If you ran the notebook with screen:
    1. You can now detach the screen using the key sequence: Ctrl+A, Ctrl+d.
    2. You can now close the ssh session to the remote machine. This will NOT close your Jupyter server.
$ exit

Step 2: Accessing the Jupyter server:

Mac / Linux:

Connection in the Mac Terminal app: 1. Open the Terminal. 2. Depending on which method you prefer (and have set up): - Command line interface:

$ ssh -N -L localhost:8889:localhost:8889 cades@172.22.3.50
- Graphical Interface: see [this document](tunnelling-remote-server.md#mac-access).
  1. Open a browser (Chrome is recommended for interactive widgets) and go to: http://localhost:8889/.

Windows:

  1. Close any open PuTTY connections to the VM.
  2. Open PutTTY, load the configurations for your machine and connect. You will be presented with a new SSH connection to the VM. You can close this if you do not need it.
  3. Open a browser (Chrome is recommended for interactive widgets) and go to: http://localhost:8889/.

Personal computer:

  1. Log in via the ORNL Citrix page.
  2. Select the PuTTY application.
  3. Follow the same instructions for Windows computers.

Step 3: Shutting down the Jupyter server:

Once you are done working on your Jupyter server, you will need to: - If you used screen and closed your SSH connection to your virtual machine where you initiated the Jupyter server, SSH into your virtual machine: 1. Windows – use your saved PuTTY profile 2. Mac / Linux: Use either the command line or graphical interface described in Step 2. For the command line interface - open the terminal and replace with your IP address:

$ ssh cades@172.22.3.50

At this point, you should either have access to an existing SSH connection to the remote machine or you should have created a new connection in the preceding step.

$ screen –r

You should be seeing the print logs of the Jupyter server on the remote machine now.