I’ve been (very happily) using RStudio Connect for about a year.
Like many data scientists who don’t come from a computer science background, I’m just capable enough at engineering and dev ops to be dangerous. Over the years I’ve picked up a bunch of random knowledge about Linux, web development, and networking such that I am now pretty self sufficient, although I usually have to do way more trial and error than my more legit engineering peers to get something working.
In the case of RStudio Connect, which is software that you install on your own server, the process was pretty straightforward. I created a medium sized EC2 instance on AWS, ran the commands in the installation guide to get the necessary software dependencies installed, changed the port number to 80, and created a
CNAME record in AWS Route 53 to point my domain name to the EC2 instance. No hiccups. My inner devOps swelled with pride, but I should have known it’s never that easy.
Later, a guy who does security work for our company told me that my service was sending passwords in plaintext. Yikes. I had made a note to secure the server with HTTPS, but had let it slip. My past experiences with HTTPS had always been confusing enough, it’s just a rabbit hole I don’t like going down.
For those unfamiliar with HTTPS, there are two main ideas behind it. One is encryption. With regular HTTP traffic, your requests are sent across the internet in plaintext. This can be really bad. If you’re working at a cafe with public wifi, a hacker on the wifi network could pretty easily record all your web activity. Sending traffic through HTTPS solves that, using public key cryptography (a favorite topic of mine, but I won’t get into the details here). The second idea packaged into HTTPS are security certificates. If you’re on Google Chrome, you’ll notice my site (as well as most sites you visit these days) has this little green padlock next to the URL.
That lock is there to tell you that some third party has verified that I am who I say I am, and that when you connect to my site, you’re actually sending data to me and no one else. This prevents something called a man in the middle attack. Rather that go deep into this complex topic, I’ll just post some links at the bottom for curious readers who want to know more about HTTPS.
Here, I will walk you through the steps you need to take to secure your RStudio Connect server. We’ll be using a service called Let’s Encrypt. Let’s Encrypt is a free, automated, and open certificate authority brought to you by the non-profit Internet Security Research Group (ISRG).
certbot is the official command line client for Let’s Encrypt. It’s developed by the Electronic Frontier Foundation, a very cool and legit foundation.
SSH into your server first, and then as any user, run this to download the
wget https://dl.eff.org/certbot-auto chmod a+x certbot-auto
Run this to temporarily stop your RStudio Connect server
sudo systemctl stop rstudio-connect
HTTPS traffic is sent through port 443, unlike HTTP which is sent through port 80. If you are running RStudio Connect on an EC2 instance, you’ll want to edit the security group for that instance and open up port 443 for HTTPS traffic.
From the same directory you were in when you downloaded
certbot in step 1, run:
./certbot-auto certonly --standalone -d mydomain.com
Replacing mydomain.com with your RStudio Connect’s domain. It will ask you a few questions before it begins generating the certificates. At the end of the process, it will display the paths to the certificate file and the key file. Both should look something like this:
# Certificate File /etc/letsencrypt/live/mydomain.com/fullchain.pem # Key File /etc/letsencrypt/live/mydomain.com/privkey.pem
We now have our certificates and just need to tell RStudio Connect where they are and to start using HTTPS instead of HTTP. You (probably) need to use sudo to edit the config file, and unless you installed emacs or vim, you’ll probably just want to go with nano.
sudo nano /etc/rstudio-connect/rstudio-connect.gcfg
Here’s what I edited mine to look like. I commented out everything HTTP related, uncommented the HTTPS section and added my key and certificate paths. Then I added an
HttpRedirect so that HTTP traffic through port 80 would be redirected to 443.
;[Http] ; RStudio Connect will listen on this network address for HTTP connections. ;Listen = :80 [Https] ; Rstudio Connect will listen on this network address for HTTPS connections Listen = :443 Certificate = /etc/letsencrypt/live/<mydomain.com>/fullchain.pem Key = /etc/letsencrypt/live/<mydomain.com>/privkey.pem [HttpRedirect] Listen = :80
sudo systemctl restart rstudio-connect
You should now be able to navigate in a browser to
https://<mydomain.com> and see the green lock, indicating your site is now coming through on HTTPS and your server is authenticated by Let’s Encrypt.