(Many configurations are already in my backups of the Academy website or my other sites)
Web server
A web server is a program or a computer with a program that processes requests and returns responses over the HTTP network protocol. Any web application needs a web server. When a web application is placed on a web server, the web server hosts that application.
When any web server receives an HTTP request, depending on how it is configured, one of several scenarios may occur:
- the web server will return a ready-made file stored on the computer where the web server program is running;
- the web server will itself make a request to another web server, receive a response from it, possibly modify it, and return it in response to the received request;
- the web server will respond with a redirect to another address or an error if the request is incorrect.
In this topic, we will look at these scenarios using a simple web application as an example. It will be a web page with a clock image and a button. When you press the button, the current time is displayed on the page. The application consists of an HTML page, an image, and a Python API that determines the current time. We will also make this application accessible on your local network.
Installing Nginx
To host a web application on a server and make it accessible on the network, we need to install a web server program on the virtual machine and configure the web server itself. Nginx can be installed using the package manager:
The Nginx package contains the Nginx service configuration. Let's check its status and also make sure the service starts automatically on every computer boot:
Right after installation, Nginx is configured to return a default page in response to HTTP requests. We can already view this page in a browser by entering the IP address of the virtual machine in the address bar — the same one you use when connecting to it via SSH.
At this same address, this page should open on any device connected to the same network. For example, if your computer and phone are currently connected to the same Wi-Fi, you can open this website from your phone. But this is not the website we need to launch, so let's go back to the terminal.
How exactly Nginx handles HTTP requests is determined by configuration files. The main configuration file is /etc/nginx/nginx.conf.
It contains settings such as TLS protocol versions, the path to files where the Nginx program writes its logs, and references to other additional configuration files. The most interesting files for us are located in the sites-enabled directory. A single Nginx service installed on one computer allows us to host as many separate sites as we want, and these sites are configured using files located in the /etc/nginx/sites-enabled directory.
Take a look at the default site configuration file, which was created automatically after installation: /etc/nginx/sites-enabled/default. As in Bash, the hash symbol marks commented lines here. Each site hosted by Nginx corresponds to one server block, within whose curly braces the configuration commands are described. Each command must end with a semicolon.
The first command in this block — listen — specifies the IP address and TCP port on which the site will listen for HTTP requests. The address may not be specified; in that case, if the server has multiple IP addresses, the site will be accessible at all of them.
Two colons in square brackets, specified in place of the address in the second listen command, indicate that this site is also accessible at any IPv6 address.
The port specified in both listen commands (80) is the port used by all programs by default for HTTP. By the way, port 443 is used by default for HTTPS.
Both listen commands here end with the default_server parameter. This means that this particular site will be used by default when an HTTP request arrives on the specified ports and addresses via listen.
Hosting static content
Once the web server is installed, we can start configuring it. First, we need to clone the repository with the website files to the local computer:
Let's transfer our website files to the virtual machine — this can be done using scp:
The command needs a source (the directory with the website files), a destination (the home directory on the virtual machine), and the -r parameter since we are copying a directory. If the SSH key has a non-standard name, you also need to specify the path to it.
Now let's move on to server configuration — for this, we need to connect to our virtual machine:
The first part of the website is static content (an HTML document and an image) located in the frontend directory. Let's set up its hosting on Nginx. First, we need to prepare the files that our web server will serve. They can be stored in any directory on the server, but typically such files are kept in a separate directory under /var/www:
The one working with these files is one of the Nginx processes called the worker process. These are the processes that handle incoming HTTP requests, while the master process only creates additional worker processes automatically as needed. You can view the nginx processes using ps:
Next, we need to check that the permissions are correctly set on the files that make up our static content:
Since we created the timenow directory as the root user, the directory belongs to root. But the current permissions allow all users to read these files, which is fine for us.
Now that the file directory is ready, let's prepare the site configuration. It is convenient to store each site's config in a separate file, since Nginx reads all files from the sites-enabled directory:
And here is the config itself:
Here you can see the listen commands for all IPv4 and IPv6 addresses on port 80. The root directive in Nginx specifies the directory from which the site serves static content (here you need to specify the directory we prepared in the previous step).
The server_name directive specifies the host values by which requests to the port and address will be filtered. This is also the domain name at which the website will be accessible. You can specify multiple domain names separated by spaces, but we will specify only one — timenow.local.
Now we can reload the changes in Nginx. This can be done by restarting (i.e., stopping and starting) its service. If the config contains no errors, the service will start:
If the service did not restart, systemctl status will return the file name and the line with the error.
After restarting, let's make an HTTP request to our website:
In response, we will receive the HTML page. We made a request to the web server by IP address and specified the Host header with the domain we used in the config file. This is exactly the header that would be added automatically if we tried to make an HTTP request using the DNS name instead of the IP address:
But such a domain name does not exist yet. To be able to open the website without specifying this additional host header, for example in a browser, we need to understand a bit about how DNS works.
DNS for the site
DNS (Domain Name System) is a large, distributed, hierarchical database that stores domain names and the IP addresses that correspond to them. When you open a website in a browser and the website address uses a domain name, your computer — specifically, its operating system — translates that domain name into an IP address. This IP address is then used to establish a connection with the server hosting the website. The name-to-IP translation occurs in several stages:
-
The operating system checks its DNS cache. This cache is a local database of DNS records. When the operating system translates a name to an IP address, it saves these values in the cache so it doesn't have to query DNS servers next time it needs to establish a network connection to another address.
-
The operating system checks whether the needed record exists in the
hostsfile. In this file, users can manually specify domain names and the IP addresses that correspond to them. By default, this file contains only one such record —127.0.0.1 localhost, where127.0.0.1is a special address that computers reserve for themselves, andlocalhostis the standard domain name for such an IP address. -
The operating system checks which DNS server is specified in its configuration. DNS server addresses are usually obtained by the computer via DHCP when connecting to the network, although they can be specified manually.
-
The operating system makes a request to the DNS server and receives a response. The received response is cached, i.e., saved for future use, and used to establish a network connection to the service.
💡 In fact, a DNS cache only exists in Windows; in Linux and macOS, it is absent. In all other aspects, domain name to IP address translation works the same across all operating systems.
To be able to connect to our website, we will add the corresponding entry to the hosts file, since we don't have our own DNS server. This change will need to be made on any computer from which you want to open the website. There are a few more nuances to keep in mind:
-
Only computers on the same network as our web server will be able to connect to the website. If your laptop with the virtual machine is connected to Wi-Fi, then to open the website from another laptop, it must also be connected to the same Wi-Fi.
-
Currently, our virtual machine gets its IP address via DHCP. After you reboot it, this IP address may change, and accordingly, it will need to be updated in all
hostsfiles where you added the entry.
DNS is simply essential for hosting web applications. Thanks to the fact that we access sites by domain name rather than by IP address, we can easily host many websites with different domain names on a single web server, and they will all work simultaneously!
Configuring DNS on macOS and Linux
On Linux and macOS, the hosts file is located in the same place:
The principles of this file are the same across all operating systems: # denotes a comment, and entries should be added in the format IP address followed by a space and the domain names that correspond to it.
After adding the entry, you can open the website in a browser: http://timenow.local/.
💡 The hosts file must be opened with sudo.
Configuring DNS on Windows
On Windows, the hosts file also cannot be edited by regular users. So first you need to open the text editor Notepad with administrator privileges. Running a program with administrator privileges on Windows is a kind of analog to sudo. Then, using the text editor launched with administrator privileges, open the hosts file at its standard Windows path: C:\Windows\System32\drivers\etc\hosts. This file looks very similar to the one we saw on macOS.
Add an entry at the end and save the file. Next, you need to clear the DNS cache. The DNS cache only needs to be cleared if the operating system previously translated the needed domain name to some other address that differs from the one we specified in hosts. But to practice, let's do it anyway. Launch a PowerShell or cmd terminal with administrator privileges and run the following command:
Now you can open your website from your own and from any other computer connected to your network!
Troubleshooting Nginx
A web server is a complex program with many parameters and capabilities. Making mistakes when configuring it is perfectly normal — the main thing is knowing how to find the cause of the error or incorrect behavior.
While configuring the behavior of the Nginx web server, we were modifying existing or creating new configuration files. For the Nginx service to pick up the changes we made in the configuration, we restarted the service using systemctl. Let's try making some change to the configuration file and restarting Nginx:
When we check the service status after the restart, we will see that it is stopped. There is also an error indicating the configuration file and the line with the error. Let's fix it and restart the web server:
If any of the Nginx configuration files contain an error, the service simply won't start, and the message in its status information will help determine what exactly the problem is.
If the web server is running, information about what is happening with it is recorded by Nginx in special files called log files, or simply logs. These logs are located in the /var/log/nginx/ directory. As new entries appear in these files, Nginx archives the old ones.
Let's try to download some non-existent file from our website and see what information is recorded in the logs. For convenience, we'll make the HTTP request using curl from the virtual machine where our web server is deployed:
In response, we receive HTTP response code 404, meaning the link was not found. Let's look at what information was recorded in the logs. Nginx has two categories of logs:
-
Access log contains information about all HTTP requests that arrive at the web server:
/var/log/nginx/access.log. It records the time, the IP from which the request was made, the method, URI, HTTP response code, response body size in bytes, and information about the type of device from which the request was made. The last entry in the access log corresponds to the incorrect request we made. -
Error log contains information about errors that occurred in Nginx while processing requests. It is stored in the file
/var/log/nginx/error.log. This log also contains an error corresponding to the request we made earlier withcurl.
If you encounter problems with Nginx, the service status and log files are exactly the places to look for more detailed information about the problem and hints on how to solve it. First, you should check the status of the service itself, and then, if it is running, check the access and error logs.
Launching the API
The second part of the site is a Python API (devops_timenow-app/backend/api.py). It is called by JavaScript embedded in the HTML page whose hosting we have already configured. The API performs server-side computations — specifically, it checks the current time and returns a response. Our API is not running yet, which is why the button on the site doesn't work yet.
The API contains only one endpoint (time). It is written using the Flask library, which allows running web applications by specifying the port and IP address at which the application should be accessible.
The port on which the API will run is 8080. We cannot use port 80 because only one program can use the same port, and 80 is already being used by Nginx for the static content web page.
The host 0.0.0.0 means the API will accept requests on all IPs at which the web server is accessible. Such an address here is the same as an unspecified IP address in the listen directive in the Nginx website configuration.
Let's start the process in the terminal:
cd devops_timenow-app/backend
sudo apt-get install python3-pip
pip3 install -r requirements.txt
Python3 api.py
Now it is accessible in the browser: http://timenow.local:8080/time. The API is currently a separate program running on a separate port 8080 over the HTTP protocol. It does not currently check the host header — it simply responds to requests using Python code.
At this stage, the link will work both by IP and by the domain name we use for our web server.
Let's connect the API to our button. For this, we just need to fix the link in the already written JavaScript so that when we press the button, an HTTP request is made to this API:
Now when we open our website in a browser and try to press the button, the browser should send an HTTP request to the API, receive the time in response, and display it on the page. But the button still doesn't work.
If we open Developer Tools, we will see that the culprit is a mechanism called cross-origin-resource-sharing, or CORS for short — one of the protection mechanisms against cross-site-request-forgery attacks, implemented in all modern browsers. The purpose of this mechanism is to make sure that requests to the API can only be made from websites (origins) that the API itself allows. It is implemented on the browser side and applies to API calls made from JavaScript. Here is how CORS works:
-
When we make a request from JavaScript to the API, before executing this request the browser makes another HTTP request with the special method
OPTIONS. Such a request is called a pre-flight. -
In response, the browser expects the API to provide information about the list of sites from which this API can be accessed. This information is passed in the response header
Access-Control-Allow-Origin. -
The browser compares the address of the origin website whose JavaScript code is trying to call the API with what it receives in the
Access-Control-Allow-Originheader, and executes the actual HTTP request only if the origin is in the list received from the API server. -
The list of addresses returned by the API in the
Access-Control-Allow-Originheader is called the CORS policy. Configuring the CORS policy is one of the most common tasks when setting up API hosting, since by default it is empty and all origins are blocked.
To allow the button on the site to make a request to the API and display the time, we need to configure our API so that the response header Access-Control-Allow-Origin is added to HTTP responses with the OPTIONS method (or any method), containing the address of our page with the button. If the API is small, runs on only one server, and is called by only one page, it's easier to rewrite it to add this header, but usually this problem is solved in a different way, which we will now look at.
Configuring a reverse proxy
One of the web server's operating modes is when it receives an incoming request, sends it to some other server, receives a response, and sends it back to the client. In this case, the web server can modify the response before returning it to the client, for example, by adding an HTTP response header. This web server operating mode is called a reverse proxy, and we will use it to configure the CORS policy.
Open a new SSH session to the virtual machine and create a new Nginx site config in it:
And here is the file itself:
server {
listen 80;
listen [::]:80;
server_name timenow-api.local;
location / {
proxy_pass http://localhost:8080/;
}
}
Let's start with the server block, to which we add listen commands for port 80 and a separate server_name timenow-api.local so that our API is accessible at a separate host on the same port 80. Next, we add the reverse proxy configuration — the location block. For this block, you need to specify the path on the site for which the configuration is set, and inside you need to add the proxy_pass command, whose parameter is the API address. Now let's restart Nginx, add the server name for the API to the hosts file, and try to open the API in the browser via the reverse proxy:
Let's add the domain we used for the API in the hosts file on our computer to make it easier to make HTTP requests to it from the browser:
The reverse proxy is configured! Now we have an address at which the API is accessible through Nginx. The last remaining detail is to add the CORS header to the reverse proxy configuration:
This can be done by adding the add_header command inside the location block. Its first parameter is the name of the header to add, and the second is the value:
Here is what the configuration file should look like after this change:
server {
listen 80;
listen [::]:80;
server_name timenow-ap.locali;
location / {
proxy_pass http://localhost:8080/;
add_header Access-Control-Allow-Origin http://timenow.local;
}
}
Let's restart Nginx to apply the config file changes, and see if this header comes back in the response when we access the API through the reverse proxy:
sudo systemctl restart nginx
systemctl status nginx
curl -v http://timenow-api.local/time
curl -v http://timenow-api.local:8080/time
As you can see, if we access the API through the reverse proxy, the header comes in the response, but if we make a request directly, the needed header is not in the response.
Let's fix the API address used by the button on our site so it accesses the API through the reverse proxy:
Now after reloading the page, the button works!
But there is still room for improvement. Let's imagine we can only edit one DNS record — timenow.local. This is a fairly common situation, since to register a domain name on publicly accessible DNS servers, you need to purchase it. If we can only use one domain name, both parts of the web application need to be hosted using one server_name. This can be done by moving the location block with the reverse proxy configuration into the server block that we used to set up static content hosting.
The only thing to change for the location block in this case is its path — for example, to /api/. With this configuration, Nginx will filter all requests arriving on port 80 for this server_name. If the request address starts with /api/, the request will be redirected and handled by our reverse proxy, i.e., forwarded to the API. The rest of the requests for this host will be processed as static content requests. Here is what the resulting website configuration file should look like:
server {
listen 80;
listen [::]:80;
root /var/www/timenow;
server_name timenow.local;
location /api/ {
proxy_pass http://localhost:8080/;
add_header Access-Control-Allow-Origin http://timenow.local;
}
}
Let's restart Nginx and check if the API is accessible at the new endpoint:
After this change, both the API and the static content are accessible at different addresses on the same domain. All that's left is to update the API address in the HTML page so the button works again:
Reverse proxy is often used to implement website behavior features that are already available in web servers: adding host header filtering, adding response headers, placing a separate application at a different path on the domain of an existing website, and more. With a reverse proxy, you can even add authorization to any website.
TLS encryption
Up to this point, communication with our website was unprotected. HTTP traffic is not encrypted by itself, so anyone on our network has the ability to see what data we send to the web server and what we receive in response. For a website that shows the time, this is not critical, but if it were a bank's website or some other resource — that's a different matter entirely.
Before moving to practice, let's first understand the theory — encryption protocols for web traffic:
-
HTTPS is a version of HTTP that uses the SSL/TLS protocol to encrypt traffic, i.e., requests to the web server and responses from it.
-
SSL is one of the earliest traffic encryption protocols, released back in 1985 by Netscape along with their browser. Netscape released new versions of the protocol as vulnerabilities were found in older ones, but in 1999 they transferred control of the protocol to the international organization IETF (Internet Engineering Task Force). IETF handles the standardization of protocols used on the Internet — for example, they are responsible for how the TCP and IP protocols are described. IETF refined SSL and released a new version — TLS. That is why when talking about web traffic encryption, sometimes SSL is mentioned, sometimes TLS, and sometimes SSL/TLS.
-
TLS works on a similar principle to SSH. It uses asymmetric encryption to establish a connection between client and server, and for it to work, a pair of public and private keys is needed, which is called a certificate.
For the browser to verify the validity of a certificate, it must contain information about the public key of an organization that is in the Trusted Root Certificate Authorities list.
A certificate also has a set of properties, such as the issue date, the expiration date, the name of the organization that generated the certificate, and most importantly — the subject, or the subject of the certificate. The subject of the certificate is the DNS name for which this certificate was generated. A valid certificate can be obtained either by purchasing one from an organization that sells certificates or by using the free certificate authority Let's Encrypt. In both cases, the certificate issuance procedure involves verification by the issuing organization that the DNS name actually belongs to you.
But enough theory — let's move to practice. At this stage, we don't have a domain name whose ownership we can prove to a certificate-issuing organization, but we can generate a TLS certificate ourselves. It will be signed not by a trusted root certificate authority but by us, which is why it's called self-signed. On Linux, a certificate can be generated using the OpenSSL program:
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/ssl/private/nginx-selfsigned.key -out /etc/ssl/certs/nginx-selfsigned.crt
reqindicates that we want to generate a certificate;-x509indicates that we want to generate a self-signed certificate;-nodesindicates that we don't want to protect the generated certificate with a password;-days 365specifies the certificate's validity period in days;-newkey rsa:2048indicates that for our certificate we will also generate a new private key using the RSA algorithm with a length of 2048 bits;- the last two parameters specify the file where the certificate's private key and public key will be saved. When we execute this command, OpenSSL will also ask us to enter certificate properties, including its subject, i.e., the DNS name.
Configuring TLS
Once the private and public keys of the certificate are generated, we can easily use them to configure our website. For Nginx, several changes need to be made:
First, we need to adjust the TCP port in the listener commands. Additionally, the ssl parameter needs to be specified for the listeners:
If you use TLS encryption, server_name must be specified, and it must match the certificate subject; otherwise, the browser will consider the certificate invalid for the website.
All that remains is to add the paths to the private and public key files of the certificate we generated. This can be done with the ssl_certificate command for the public key file and ssl_certificate_key for the private key:
ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
After all the changes, your configuration file should look like this:
server {
listen 443 ssl;
listen [::]:443 ssl;
ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
root /var/www/timenow;
server_name timenow.local;
location /api/ {
proxy_pass http://localhost:8080/;
add_header Access-Control-Allow-Origin https://timenow.local;
}
}
Now restart the web server service and check if the site works over HTTPS:
To make the script access the API over HTTPS, let's fix our HTML:
Now if you visit the site with HTTPS, you will get an error. This error says that the certificate we received is invalid because it does not contain information about a certificate authority. This is entirely expected since the certificate is self-signed, and we are simply practicing working with TLS on our own computer. In Chrome, you can ignore this error, and the site will load and work, but over HTTPS using TLS encryption. Furthermore, if you look at the certificate information in the browser, you will see that this is exactly the certificate we generated earlier.
Let's imagine that our website has users who are used to visiting the site over HTTP. After we enabled SSL on our listeners and changed the port, the web server doesn't respond to HTTP requests correctly. Since we didn't remove the default site, our regular HTTP requests end up there, and if that site's configuration didn't exist, we wouldn't be able to get a response to an HTTP request at all.
To fix this, we need to add a redirect configuration — set up the web server so that when a user visits a link over HTTP, the web server redirects them to HTTPS. In Nginx, this requires creating another server block:
And here is the block:
server {
listen 80;
listen [::]:80;
server_name timenow.local;
return 301 https://timenow.local$request_uri;
}
Here you can see a new command — return. The parameters needed for the redirect:
- HTTP status code for the redirect (301, moved permanently);
- the address to redirect to: https, timenow (our domain);
$request_uri— a variable representing the URI for which the request was made.
Let's restart the web server and test the redirect:
Since browsers can cache redirect information, let's open a new incognito window in the browser along with Developer Tools to be able to view HTTP requests and responses. Now when we try to go to the old HTTP address of our website, we receive a response with HTTP status code 301 — this is our redirect. If we again ignore the certificate error, our website will load over HTTPS using TLS encryption.