Back online: lessons learned from recovering my website after a data center fire | Blog

About a week ago, one of OVH’s data centers was affected by a major fire (see the press release in French and English). It turns out that my website was hosted there and had therefore been unavailable since March 10th. I’m glad to announce that I’ve put it back up online again! As a bonus, it’s now accessible via both IPv4 and IPv6.

Fortunately, I didn’t lose any data, as I only host static content and had various backups at home. However, backups are not everything and I still had to re-configure a new server to bring my website up again. This ended up being a good opportunity to refresh a few things in my configuration, but most importantly to make the whole installation process more automated, and therefore be prepared if another incident happens in the future.

In this post, I’ll explain what I learned from re-installing my server, and describe my configuration to have an HTTPS website running on it.

Ordering a new server
Automating the installation process
Basic server configuration
Website hosting
- HTTPS certificates
- HTTP server
Conclusion

Ordering a new server

When I first set up my website about 5 years ago, I decided to host it on a Virtual Private Server (VPS). The main advantages I saw from this setup is to have access to a server with full control of the OS, as well as dedicated fixed IPv4 and IPv6 addresses. The “virtual” part is about the server being hosted on a virtual machine in the data center, but it doesn’t matter much in terms of features (for my use case at least).

Full control on the server may be overkill to host a static website, but I liked the flexibility of not being restricted to any website framework, and the ability to tinker a bit if I wanted to do something else besides hosting the website. This of course comes with additional responsibilities which I’ll discuss in the following sections, namely securing the server, setting up automatic updates, etc. But I saw it as a learning opportunity given that I had the time and willingness to do it.

As for the dedicated IP addresses, this allows to configure the DNS quite easily and independently from the server.

To come back to this incident, once OVH confirmed that my VPS was affected by the fire and that it wouldn’t be recovered, my first step was to order a new VPS. From the customer interface this was quite straightforward, and given that the pricing got cheaper over 5 years, I could “upgrade” my setup to have 2x more disk space while reducing the bill by 25%. I had already thought about this upgrade in the past months, but hadn’t really had the willingness to do it given the extra cost of re-installing everything. Well, at least this incident didn’t give me a choice!

Overall, including ordering, payment and setup, my new VPS was available within about 15 minutes. As soon as I had the new IP addresses, I configured my DNS records for gendignoux.com to point to them – as the configuration panel mentioned that previous DNS records may take up to 24 hours to expire in various caches around the world. DNS is also required to obtain HTTPS certificates via Let’s Encrypt, as we will discuss later – although in practice about 5 minutes seemed to be enough to have a new DNS record visible to Let’s Encrypt.

Automating the installation process

Before describing my configuration, which is rather specific to my use case (serving a static website), I want to mention the most important thing I learned from this recovery experience, and which is quite universal: make the (re-)installation process as automated as possible.

It turns out that when I started my website 5 years ago, I took care to document the installation steps in various scripts, but I basically had a separate script for each piece of software and configuration. This means that I had all the information to re-install my website, but following the steps scattered across many scripts and folders would be quite a tedious and error-prone manual process. What if I forget to run one script? What if I run some scripts in the wrong order?

Back to today, I invested some time to productionize this into a simple setup involving at most 5 steps including copying files to the server and running the installation scripts. Whatever your setup is, making it simple to install is really my recommendation to be prepared for incidents like total loss of a server – apart from having a robust backup process of course.

So what I did is the following.

Have a single setup/ folder containing all the installation scripts and files. I can copy this folder to the server via a simple scp command, namely scp -r setup username@$IP:~.
Have a single setup.sh script containing all the installation steps. I use Bash with a set -eux line at the beginning so that it will print a verbose output and abort if any step fails. Having the script abort upon failure is convenient to easily notice any problem in the future, for example if some packages become outdated.
To edit configuration files, rather than copying files directly, I decided to use git to apply patches on the default configuration. This will be convenient if I install again my server in N years, as the diff will allow me to notice if a default configuration file has changed, and to manually check if I should update my custom configuration as well. More specifically, I did 2 things.
- Preparing the patches. I created 2 folders for the original and custom configuration, placing the original and edited files in there, and creating a patch with git diff --no-index original/ custom/ > patch.
- Applying the patches. I installed git on the server and used sudo git apply -p2 --unsafe-paths --directory=/ --verbose patch to apply the patch at the root filesystem. It may be useful to use git apply --check as a preliminary step to check if the patch still applies. It may seem quite heavy to install git only to apply a few patch, but the key here for me was to keep it simple.

There are some steps as well to restart/reload services and reboot the server at the end, and I also have a separate script to push all the pages on my website afterwards, but that’s really all there is now in my installation process!

The last important thing was to actually test that this process works properly. From the configuration panel in OVH, I can re-image my VPS pretty much instantly – let’s say that it takes 10 seconds on top of that for the server to boot and be available via SSH. So there really is no excuse not to try the whole process a few times. The only caveat I encountered was hitting rate limits when trying to obtain HTTPS certificates several times in a row, as I’ll discuss later.

Overall, I probably spent half a day to refresh my installation process, but this makes sure that I’ll be able to re-install my website in a matter of minutes next time such a major incident happens, or next time I want to upgrade to a different hosting plan.

Basic server configuration

The VPS I ordered comes with a pre-installed OS, I chose Debian as I’m already quite familiar with it. My first step before even installing any website-specific software was to apply some basic security measures on top of the stock OS. Even if I eventually only host static website content, the server is directly accessible on the public Internet, so this is quite important.

The default server comes with SSH access via username and password, provided in a confirmation email.

The first step was to configure public key authentication and disable various SSH features that I don’t use. You can find many more tips about OpenSSH configuration on ArchLinux wiki. I updated the following parameters in the /etc/sshd_config file.

Disabling root login (PermitRootLogin no), password authentication (PasswordAuthentication no), TCP forwarding (AllowTcpForwarding no) and X11 forwarding (X11Forwarding no).
Changing the SSH port from the default of 22 to something custom. While this doesn’t add much security in itself, this removes a bit of spam from failed login attempts done by various scanners on the Internet, which would appear in the systemd logs (visible when running systemctl status --full ssh or journalctl -u ssh).

Regarding public key authentication, it was already enabled but I had to add my public key to $HOME/.ssh/authorized_keys. When re-imaging my server, there was actually an option to upload a public key to my OVH account and pre-install it – quite convenient to avoid waiting for the email containing the password.

The pre-installed password that I received by email contained 12 alphanumeric characters (uppercase/lowercase letters and digits). As far as I can tell, this was randomly generated every time I asked to re-image the VPS from the configuration panel. Assuming that the random generation process was implemented properly, this would amount to $12 \cdot log_2(62) \approx 71$ bits of entropy.

One could argue that it’s quite unlikely to ever be brute-forced via SSH – after all this VPS is not the most powerful computer in the world and can only handle so many login requests per second. Still, the cryptographic keys used for SSH authentication have more entropy, are more convenient to use, and have not been transmitted via email.

Another thing related to login was how to run commands as root via sudo. By default, the VPS was configured in “no password” sudo mode, making it easy to run commands as root directly from the un-privileged default account. I prefer sudo to require a password, not so much in terms of attack surface – access to the default account should be protected by a strong SSH configuration – than to require manual confirmation every time a privileged command is run and avoid shooting oneself in the foot by mistake.

The “no password” mode was enabled via some configuration files present in the /etc/sudoers.d/ folder and containing lines like debian ALL=(ALL) NOPASSWD:ALL. Simply removing these files made sudo require a password.

Automatic updates

The most important piece of configuration is probably to run up-to-date software, in particular applying security updates. The first thing to do is therefore to apply any updates left since the server’s default image was created.

apt update
apt upgrade -y

The second thing is to have updates happening automatically without any manual intervention – you don’t want to log into your website’s server every day to install updates. On Debian, this can be done relatively simply via the unattended-upgrades package, mostly following this guide.

Apart from installing the package, I also implemented the following steps.

Installed further packages to keep track of what’s been updated: apt-listchanges, exim4, mailutils. This will send “mails” to the root user whenever updates are checked and/or applied.
Setup forwarding of these mails from root to the default (debian) user, by adding a root: debian line in /etc/aliases. You can then test that this forwarding works by deliberately running sudo with a wrong password.
Edited the /etc/apt/apt.conf.d/50unattended-upgrades configuration to:
- mail reports to root,
- remove unused dependencies,
- automatically reboot when required (typically following a kernel upgrade).
Used the extended configuration found here, which in particular records verbose reports in the mails sent to root, and enables some automatic cleanup.

Network configuration

By default, there is no firewall configured on the server, i.e. all ports for TCP, UDP, ICMP (ping), etc. are open and accessible from the Internet. Even though all of these open ports shouldn’t be useful to attackers as long as no service is running behind them, by the principle of least privilege it’s still a good practice to only open ports that have been explicitly allowed.

I’ve therefore configured some firewall rules with iptables. The base configuration is to flush any existing rules, and block incoming packets, as well as forwarding (I’m not running a router).

# Flush all rules
iptables -F
# Remove all tables
iptables -X

# Setting default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT # Accept outbound traffic

On top of that, one has to allow packets related to an existing (outbound) connection, as well as on the “loopback” interface. Then, you can choose which ports to open, typically at least SSH and (in the case of a website) HTTP(S). Allowing ICMP can also be useful to allow troubleshooting connections via ping.

# Existing connection
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Loopback
iptables -A INPUT -i lo -j ACCEPT
# Ping
iptables -A INPUT -p icmp -j ACCEPT
# SSH (replace by custom port)
iptables -A INPUT -p tcp --dport  22 -j ACCEPT
# HTTP(S)
iptables -A INPUT -p tcp --dport  80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

There is a more recent nftables replacement to iptables. I should migrate to it at some point, but my rules are simple enough that it wasn’t worth learning the new syntax at this point – iptables are still maintained anyway.

If you have IPv6 enabled, you need to duplicate these rules with ip6tables as well, otherwise the firewall is quite useless as it can be circumvented via IPv6. Note that you’ll have to replace icmp by icmpv6.

Once the iptables and ip6tables rules are set up, it’s important to make them persist, otherwise they are flushed upon reboot by default! For this, simply install the iptables-persistent package. This should ask you to save the current rules, which you can then confirm by checking the /etc/iptables/rules.v4 and /etc/iptables/rules.v6 files.

One last piece of network configuration was actually to enable IPv6. Indeed, my server was allocated a static address by OVH, but this wasn’t actually configured on the server (as shown by running ip addr), so it couldn’t be reached via IPv6. To enable it, I created a file named /etc/network/interfaces.d/99-static-ipv6 and simply used the manual interface configuration described in Debian’s wiki, using the address and gateway provided by OVH.

Misc

Here are other pieces of generic configuration that I’ve done – or not.

Disabling ptrace. I’ve mentioned this in a previous blog post, and once again following the principle of least privilege I disabled ptrace from the kernel.
Custom bash profile. The $HOME/.bash_profile script is run every time upon (SSH) login, and it can be useful to display an overview of the system at this time. Some commands you may want to put in there:
- last -i to see the last logins/reboots,
- uptime to see how long the server was up and running,
- df -h to see how much disk space is still available.
- free to see how much RAM is currently in use.
- I also like to put a custom message or ASCII art logo in /etc/motd, which will be printed upon login.
Not fail2ban. In my previous setup, I had configured fail2ban. This service scans various logs and bans IP addresses that show malicious signs like too many password failures, seeking for exploits, etc. I decided not to install it this time as I’m not convinced of the value it brings, at least for my use case (static website). Indeed, this adds quite a bit of complexity with many configuration files, more rules in iptables to ban the “attackers”, etc. On the other side, most detected “attacks” (e.g. failed password attempts on SSH) are irrelevant on an up-to-date system with well configured SSH and firewall. More sophisticated attackers could anyway use a different IP address for each connection. Last, this may ban legitimate visitors of my website who would share an IP address with an “attacker”.

Website hosting

Now that my server had its system configured, it was time to actually implement the main feature: hosting the website itself.

HTTPS certificates

Before even installing an HTTP server, I configured some HTTPS certificates, so that I could directly refer to them later. To do that, I simply used the Let’s Encrypt certificate authority, available via the certbot package. This relies on the ACME protocol (RFC 8555), which essentially allows anyone in control of a domain (via DNS) to obtain a certificate for it in an automated manner.

The commands to request a new certificate were simply the following.

certbot register --agree-tos --email <you@email>
certbot certonly --standalone -d <domain>

What has become a no-brainer today – obtaining certificates for free via a few commands in the terminal – was actually not when I first setup my website 5 years ago. Back in 2016, Let’s Encrypt had only existed for a few months, so I had to install the package via backports, and create some manual cron jobs to trigger certificate renewal.

It was much simpler today: the package is fully supported in Debian stable, including the following systemd timer for the renewal.

$ cat /lib/systemd/system/certbot.timer
[Unit]
Description=Run certbot twice daily

[Timer]
OnCalendar=*-*-* 00,12:00:00
RandomizedDelaySec=43200
Persistent=true

[Install]
WantedBy=timers.target

You can check the status of the service and its timer with systemctl status certbot.service and systemctl status certbot.timer.

However, while testing my installation, I ended up reaching the current rate limits of the service, which allows to generate only 5 certificates per week for a set of domains! This was easy to reach after several runs to check my installation scripts as I recommended above.

Obtaining a new certificate
An unexpected error occurred:
There were too many requests of a given type :: Error creating new order :: too many certificates already issued for exact set of domains: gendignoux.com: see https://letsencrypt.org/docs/rate-limits/
Please see the logfiles in /var/log/letsencrypt for more details.

To avoid such situation, you can review the list of certificates already issued for a given domain on crt.sh (this list is not refreshed instantaneously though).

HTTP server

The last step of my setup was to install the core business logic for my website: an HTTP server. For this, I used Nginx, which seemed quite popular when I first created my website. Quite frankly, I didn’t try to compare features and performance across web servers given that I basically just need to (1) serve static content and (2) support HTTPS.

On top of the base configuration, I essentially added some TLS configuration to point to the Let’s Encrypt certificates, and created a 301 redirect from HTTP to HTTPS so that my website is only available over HTTPS.

To publish new content on the website, I simply use rsync in SSH mode to directly update the files that nginx will automatically pick up – I don’t even need to restart or reload the nginx service.

One thing I changed was the certbot script for certificate renewal. To avoid it colliding with nginx, I updated the ExecStart line in /lib/systemd/system/certbot.service to stop nginx during the certificate renewal process.

ExecStart=/usr/bin/certbot -q renew --standalone --pre-hook "service nginx stop" --post-hook "service nginx start"

This means that my website isn’t available for a few seconds when certificate renewal is checked twice a day. In the future, I could probably try the --nginx option of certbot to avoid that.

Conclusion

Fortunately, despite this major incident on the data center that hosted my website, I didn’t lose any data given that I was only hosting static web content, with backups at home for both my website and my server installation scripts. Still, it was a good time to refresh, automate and test this installation process, so that I don’t have to spend half a day to collect scripts next time I need to migrate to another server!

And even if you have not been affected by a major incident like this, now is a good time to ask yourself the same questions. Do you have good backups for your data? Can you recover and reinstall your setup quickly?

In the end, the odds were quite small that among all the data centers in the world my website would be hosted on the one that caught fire, but yet it happened to me, and I was glad to have good backups. Even if the probability of a fire affecting you is quite small, there are many other and probably more likely things that could go wrong. Maybe your laptop can get stolen or just break by falling on the floor. Maybe you’ll lose your password and cannot access some online service anymore. Maybe a hard drive will crash and all data will be lost.

For all these cases, be prepared today and you’ll avoid worrying tomorrow!

Comments

To react to this blog post please check the Twitter thread.

RSS | Mastodon | GitHub