Install Jupyter notebook and Rstudio server on AWS EC2 instance
Leverage cloud computing for data analysis
Introduction
Python and R are two of most popular programming language for data analysis. Jupyter notebook and Rstudio are respective IDEs. Everyone intending to develop the skills on data science will have at least one of these tools on their local machine. Compared to local machine, cloud computing will allow people to access these tools anywhere at anytime. Even though there is cloud version of these IDEs(google colab and Rstudio cloud), they have the limitations. For example, every time you start a new colab VM, you need to install the packages unless you configure to install the packages on google drive which cause some technical overhead. Rstudio cloud has a limit of 15 hours for free account. In this article, I will share the steps to install Jupyter notebook and Rstudio on AWS Elatic Compute Cloud(EC2) and the server can be ready anytime when the user log in.
Amazon Web Service
Firstly, you need to create and activate AWS account. The steps can be found here. Use AWS’s least privilege principle to create an administrator account. Loginning AWS console to launch EC2 instance. AWS offered one year free use of EC2 (t2.miro). All the rest can choose default setting.
Follow the steps here for windows to connect to EC2 using putty. Mac OS and Linux connection is more straight forward. You can also use the EC2 instance connect on browser.
You should see this after successful connection.
Install Anaconda
Execute these four commands below to update the system and download the anaconda shell script.
sudo yum update -y
sudo yum -y groupinstall "Development tools"
sudo yum install openssl-devel bzip2-devel expat-devel gdbm-devel readline-devel sqlite-devel
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
Wait patiently for the execution completion and execute this command to install anaconda. You may need to click enter or yes to proceed during the whole process.
bash Anaconda3-2021.05-Linux-x86_64.sh
Once the installation is completed, you need to activate the configuration. source ~/.bashrc
You should see base enviroment and python version 3.8.8 after typing python in terminal to confirm the successful installation.
Run Jupyter notebook remotely
Now that anaconda has been successfully installed on EC2 instance, we will run Jupyter notebook remotely. Firstly, we can execute these commands to generate configuration file and set password.
jupyter notebook --generate-config
Run this in ipython environment and type the password twice. Save the hash string generated for later use (!!important).
from notebook.auth import passwd
passwd()
Exit python and create certificate for https
mkdir certs
cd certs
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem
Configure jupyter_notebook_config.py
sudo vim /home/ec2-user/.jupyter/jupyter_notebook_config.py
Modify five settings as below(use / in vim for searching and remove the #):
c.NotebookApp.password='' //(hashed-password)
c.NotebookApp.ip='0.0.0.0'
c.NotebookApp.open_browser=False
c.NotebookApp.port=8888
c.NotebookApp.certfile='' //(the directory for mycert-pem)
Execute the command below to run the server
jupyter notebook
Edit the security group for this EC2 instance
Access the server by going to
https://(your AWS public dns):8888/
The server will stop running if we close the terminal. We can associate an elastic IP and execute the command below for no hang up:
nohup jupyter notebook &
Install Rstudio Server
Refer to Rstudio Document to install R
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install yum-utils
sudo yum-config-manager --enable "reel-*-optional-rpms"
Specify R version
export R_VERSION=4.0.5
curl -O https://cdn.rstudio.com/r/centos-7/pkgs/R-${R_VERSION}-1-1.x86_64.rpm
sudo yum install R-4.0.5-1-1-.x86_64_rpm
Create symlink
sudo ln -s /opt/R/${R_VERSION}/bin/R /usr/local/bin/R
sudo ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/local/bin/Rscript
Download and install Rstudio Serverwget
wget https://download2.rstudio.org/server/centos7/x86_64/rstudio-server-rhel-2021.09.1-372-x86_64.rpm
sudo yum install rstudio-server-rhel-2021.09.1-372-x86_64.rpm
Create account to login RStudio
useradd [your account]
passed [your account]
As before, edit security group to add port 8787 and then access the server by going to
https://(your AWS public dns):8787/