Install Jupyter notebook and Rstudio server on AWS EC2 instance

Leverage cloud computing for data analysis

Wang Xiaoyuan
Towards Data Science

--

Image by Markus Spiske on Unsplash

Introduction

Python and R are two of most popular programming language for data analysis. Jupyter notebook and Rstudio are respective IDEs. Everyone intending to develop the skills on data science will have at least one of these tools on their local machine. Compared to local machine, cloud computing will allow people to access these tools anywhere at anytime. Even though there is cloud version of these IDEs(google colab and Rstudio cloud), they have the limitations. For example, every time you start a new colab VM, you need to install the packages unless you configure to install the packages on google drive which cause some technical overhead. Rstudio cloud has a limit of 15 hours for free account. In this article, I will share the steps to install Jupyter notebook and Rstudio on AWS Elatic Compute Cloud(EC2) and the server can be ready anytime when the user log in.

Amazon Web Service

Firstly, you need to create and activate AWS account. The steps can be found here. Use AWS’s least privilege principle to create an administrator account. Loginning AWS console to launch EC2 instance. AWS offered one year free use of EC2 (t2.miro). All the rest can choose default setting.

image by author

Follow the steps here for windows to connect to EC2 using putty. Mac OS and Linux connection is more straight forward. You can also use the EC2 instance connect on browser.

Image by author

You should see this after successful connection.

Image by author

Install Anaconda

Execute these four commands below to update the system and download the anaconda shell script.

sudo yum update -y
sudo yum -y groupinstall "Development tools"
sudo yum install openssl-devel bzip2-devel expat-devel gdbm-devel readline-devel sqlite-devel
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh

Wait patiently for the execution completion and execute this command to install anaconda. You may need to click enter or yes to proceed during the whole process.

bash Anaconda3-2021.05-Linux-x86_64.sh

Image by author

Once the installation is completed, you need to activate the configuration. source ~/.bashrc

You should see base enviroment and python version 3.8.8 after typing python in terminal to confirm the successful installation.

Image by author

Run Jupyter notebook remotely

Now that anaconda has been successfully installed on EC2 instance, we will run Jupyter notebook remotely. Firstly, we can execute these commands to generate configuration file and set password.

jupyter notebook --generate-config

Run this in ipython environment and type the password twice. Save the hash string generated for later use (!!important).

from notebook.auth import passwd
passwd()

Exit python and create certificate for https

mkdir certs

cd certs

openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem

Configure jupyter_notebook_config.py

sudo vim /home/ec2-user/.jupyter/jupyter_notebook_config.py

Modify five settings as below(use / in vim for searching and remove the #):

c.NotebookApp.password=''  //(hashed-password)
c.NotebookApp.ip='0.0.0.0'
c.NotebookApp.open_browser=False
c.NotebookApp.port=8888
c.NotebookApp.certfile='' //(the directory for mycert-pem)
Image by author
Image by author
Image by author
Image by author
Image by author

Execute the command below to run the server

jupyter notebook

Edit the security group for this EC2 instance

Image by author

Access the server by going to

https://(your AWS public dns):8888/

The server will stop running if we close the terminal. We can associate an elastic IP and execute the command below for no hang up:

nohup jupyter notebook &

Image by author

Install Rstudio Server

Refer to Rstudio Document to install R

sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Image by author

sudo yum install yum-utils

sudo yum-config-manager --enable "reel-*-optional-rpms"

Specify R version

export R_VERSION=4.0.5

curl -O https://cdn.rstudio.com/r/centos-7/pkgs/R-${R_VERSION}-1-1.x86_64.rpm

sudo yum install R-4.0.5-1-1-.x86_64_rpm

Create symlink

sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R

sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript

Download and install Rstudio Serverwget

wget https://download2.rstudio.org/server/centos7/x86_64/rstudio-server-rhel-2021.09.1-372-x86_64.rpm
sudo yum install rstudio-server-rhel-2021.09.1-372-x86_64.rpm
Image by author

Create account to login RStudio

useradd [your account]

passed [your account]

As before, edit security group to add port 8787 and then access the server by going to

https://(your AWS public dns):8787/

Image by author

--

--