HTTP to HTTPS using Google Cloud Load Balancer
What is Google Cloud Load Balancer?
Google Cloud Load Balancer (GCLB) is a software defined network load balancer available to all projects on Google Cloud Platform (GCP). The technology is also used internally by services such as Google Search and Google Mail. It is the first point of entry for the majority of HTTP traffic ingressing to Google’s infrastructure.
GCLB is able to scale quickly and effortlessly. It was demonstrated a few years ago that a newly deployed load balancer could handle over 1 million requests per second without warmup!
GCLB is also region aware. Traffic coming from the US will be directed to US based servers, assuming there is capacity. Likewise, if traffic originates from EU and there is resource available to handle that traffic in the same region, it will redirected there, without any configuration from the user’s side! There is some decent documentation on GCLB available on GCP Docs so I won’t go into too much description of the constituent parts.
The problem
For modern day web applications TLS is the standard and is typical for web frontend instances to handle any redirect of inbound HTTP requests to HTTPS. This means that traffic for both port 80 and 443 are directed to the same set of instances. Proxies/load balancers such as Nginx, HAProxy, and Traefik are all capable of performing this redirect before any traffic hits the applications servers.
At time of writing, it is not possible to configure GCLB to send traffic from different ports/protocols (TCP, HTTP, etc.) to the same set of load balanced instances. At first this may seem like an overzealous restriction, but simplicity is the key to scaling any infrastructure in a resilient and automated way. The task of redirecting HTTP to HTTPS is a different responsibility than serving the web application and requires different scaling and health semantics.
I could have dropped the GCLB requirement and built something myself, or used a Kubernetes Nginx Ingress Controller but that is another piece of infrastructure that I have specifically maintain. I want to focus on building my application, so having Google manage as much low level infrastructure as possible is a fantastic idea.
So I took it upon myself to build the simplest, most robust HTTP to HTTPS service that could be re-used in the future.
Architecture
Traffic received by the GCLB on port 80 will be forwarded to an auto-scaled instance group running on GCP. It is the responsibility of that service to redirect all HTTP requests to the HTTPS equivalent.
http://www.example.com/foo?bar=baz
becomes
https://www.example.com/foo?bar=baz
For the purposes of this article, we are going to focus on the traffic hitting port 80. The traffic that comes in to port 443 (HTTPS) has no impact on the design/implementation. I am using Kubernetes as an example/reference architecture.
Note that we could have used a kubernetes ingress controller here to handle the HTTP traffic. GCLB integrates very well with instance groups. You are able to direct a max rate of traffic to each instance, helping to reduce the possibility of overwhelming the instance. This then helps the autoscaler make quicker/better informed decisions for the underlying infrastructure. With k8s you would have to handle the auto scaling yourself (which can be done but for the purposes of this tutorial wasn’t “simple” enough).
Building the container
Once the traffic hits our service, we need something to perform the redirect. I chose Nginx for this task because it is simple to configure, battle-tested, and performant.
Nginx configuration
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _; return 307 https://$host$request_uri;
}
This Nginx configuration listens to port 80 on all available IPv4 and IPv6 interfaces and returns a 307 HTTP response which will redirect to the HTTPS version of the request. Notice that the HTTP redirect code is 307, not 301/302. When the request is redirected, the browser (or device) will use the same HTTP method and body in the request whereas with 301/302 only GET is supported.
It would be more correct to use 308 (Permanent Redirect) for this service but when doing some research it seemed like there was less browser/device support for this code than 307.
For this system we will be using a Docker container (If you are new to Docker you can follow the getting started guide here). All of the code for building the Docker container for the HTTP-to-HTTPS service is available at https://github.com/RealKinetic/http-to-https. We’re using Alpine Linux to keep the size of the container image to a minimum. The image at time of writing weighs in at 6.73MB uncompressed. All of the container’s logs are output to stdout/stderr, with nothing written to the filesystem. The container is available on Docker Hub. You can pull it locally:
docker pull realkinetic/http-to-https:1.2
You can run the container locally:
docker run -it --rm -p80:80 realkinetic/http-to-https:1.2
And then hit http://localhost/ with your brower.
Deployment
GCP Compute Engine has a neat feature where you deploy a container image to run on a VM without having to SSH to the instance or run any post-create steps. By default, the VM image is provided by Google and is optimised for running containers. Using this image, all stdout/stderr output from the container are pushed up to Stackdriver by default which is exactly what is desired in this case.
This tutorial assumes that you have gcloud SDK installed on your system (if not you can follow the steps here to install and get setup) and that you have a project/zone configured. At time of writing, some of the gcloud commands used are in beta. To access these, install the beta component:
gcloud components install beta
Create the instance template
An instance template is required to create an instance group running the container we created above. It contains all the metadata needed to allow GCP to run the service for you.
gcloud beta compute instance-templates create-with-container "http-to-https" \
--container-image "realkinetic/http-to-https:1.2" \
--machine-type "f1-micro" \
--container-restart-policy "always" \
--tags "http-server" \
--image-family "cos-stable" \
--image-project "cos-cloud"
You should see output along the lines:
Created [https://www.googleapis.com/compute/beta/projects/GCP_PROJECT/global/instanceTemplates/http-to-https].
NAME MACHINE_TYPE PREEMPTIBLE CREATION_TIMESTAMPhttp-to-https f1-micro 2018–01–31T07:25:32.805–08:00
It should also be available via Google Cloud Console (you may need to select the same project as is configured for gcloud).
Health checks
In order for the service to be reliable, we need to create a health check to ensure availability from the load balanced instances.
gcloud compute health-checks create tcp http-to-https \
--check-interval=10 \
--timeout=10 \
--unhealthy-threshold=3 \
--healthy-threshold=3 \
--port-name=http \
--port=80
This will create a health check that periodically connects to port 80 on each instance and determines whether GCLB will route traffic to that instance. Note that we are not using HTTP health checks here. GCP HTTP health checks look for a 2xx HTTP code, and since we’re always going to return a 30x, the health check will never succeed. Generally speaking if we are able to connect to the HTTP server, we should be able to to get a response from it.
The values supplied here are GCP’s recommended minimums for instance groups. Since the requests per second (RPS) per instance should scale linearly with CPU (the response is extremely simple), low values for the health check are perfectly reasonable.
You will be able to see the newly created health check by running the command:
gcloud compute health-checks describe http-to-https
Create the instance group
With the health check in place, we can create the auto-scaling instance group from the “http-to-https” instance template.
gcloud beta compute instance-groups managed create "http-to-https" \
--region "us-central1" \
--base-instance-name "http-to-https" \
--template "http-to-https" \
--size "1" \
--health-check "http-to-https"
Note that we’re using the “us-central1” region. The instances that will be created to handle the traffic will be split across multiple (minimum 3) Availability Zones (AZ) to allow for zones to fail but our service to continue operating.
Enable auto-scaling
gcloud compute instance-groups managed set-autoscaling \
"http-to-https" \
--region "us-central1" \
--cool-down-period "60" \
--max-num-replicas "10" \
--min-num-replicas "1" \
--target-load-balancing-utilization "0.8"
Note: The command will come back with status: ERROR. That is okay because GCP is expecting the instance group to be attached to a GCLB backend service (which we’re about to configure).
We have configured our instance group to range between 1 and 10 instances running, to scale when the average load balancing utilisation hits 80% (that is for both up and down scaling). Note that we have not yet defined what the load balancing looks like, that comes later.
The “cool-down-period” ensures that the auto-scaler is not too aggressively adding and removing instances, allowing the existing instances to handle traffic with a guaranteed minimum interruption.
You can see the instances running for the instance group by running:
gcloud compute instance-groups list-instances http-to-https \
--region=us-central1
You should see output similar to:
NAME ZONE STATUS
http-to-https-b3nf us-central1-c RUNNING
Named ports
On GCP, load balancers send traffic to instances based on named ports. If you are unfamiliar with named ports here is their description from the GCP reference page:
Named ports are key:value pairs metadata representing the service name and the port that it’s running on. Named ports can be assigned to an instance group, which indicates that the service is available on all instances in the group. This information is used by the HTTP Load Balancing service.
Setting named ports will be important when we come to configure the load balancer.
gcloud compute instance-groups managed set-named-ports \
http-to-https \
--region="us-central1" \
--named-ports="http:80"
Configure Firewall
We now need to open up port 80 in our firewall to allow external access. This step may be unnecessary if you have already opened up 80 for your GCP project but blank projects start closed:
gcloud compute firewall-rules create allow-http \
--allow=tcp:80 \
--target-tags=http-server
Configure GCLB
By now, you should have at least 1 instance up and running that is able to handle requests for the service. Let’s hook up GCLB and show it working!
We can check to see if an individual instance is behaving as we would expect by sending a request directly to it. We can only do this as each instance has an ephemeral ip address attached. Get an ip address of an existing instance by running this monster command:
export IP_ADDRESS=$(gcloud compute instances describe $(gcloud compute instance-groups managed list-instances http-to-https --region="us-central1" --format json | jq -r ".[0].instance") --format json | jq -r ".networkInterfaces[0].accessConfigs[0].natIP")
This command queries the instance group for a list of instances, chooses the first in the list and gets it’s external IP address and assigns it to a shell variable called “IP_ADDRESS”. You can see it’s content by running:
echo $IP_ADDRESS
Output should be of the format:
35.184.18.224
Check to see if the http-to-https service is running correctly on that VM:
curl -s -I http://$IP_ADDRESS/
The correct HTTP response should be received:
HTTP/1.1 307 Temporary Redirect
Server: nginx
Date: Wed, 14 Feb 2018 19:18:37 GMT
Content-Type: text/html
Content-Length: 180
Connection: keep-alive
Location: https://35.184.18.224/
Create a backend service
Backend services are used by GCLB as region-aware buckets for the same types of traffic.
gcloud compute backend-services create http-to-https \
--global \
--health-checks=http-to-https \
--protocol=HTTP \
--port-name=http \
--connection-draining-timeout=60
Add the instance group to the backend service
gcloud compute backend-services add-backend http-to-https \
--instance-group=http-to-https \
--balancing-mode=rate \
--max-rate-per-instance=100 \
--capacity-scaler=0.8 \
--global \
--instance-group-region="us-central1"
If we wanted to scale this infrastructure to a global level, we would create instance groups in other regions around the world and link them to the backend service with something similar to the above command.
Create the url map
GCLB is able to understand different types of traffic: TCP, UDP, HTTP, and HTTPS (at time of writing). We are going to create an HTTP proxy which will allow GCLB to have a greater understanding of how many requests are going through the load balancer and inform the autoscaler to act accordingly.
gcloud compute url-maps create http-to-https \
--default-service=http-to-https
We don’t need to specify an actual url map since all requests are going to the same backend service.
Create the HTTP proxy
gcloud compute target-http-proxies create http-to-https \
--url-map=http-to-https
Create the global forwarding rule
gcloud compute forwarding-rules create http-to-https \
--global \
--ports 80 \
--target-http-proxy http-to-https
This command exposes an IPv4 address that you should use in DNS configuration for your HTTP applications.
IPv6
GCP has full support for IPv6. As a good internet citizen, we should support it as well:
gcloud compute forwarding-rules create http-to-https6 \
--global \
--ports 80 \
--ip-version IPV6 \
--target-http-proxy http-to-https
You can get the exposed IP address(es) for the service by running:
gcloud compute forwarding-rules describe http-to-https --global --format json | jq -r ".IPAddress"
I would suggest assigning this to a shell variable for future use:
export IP_ADDRESS=$(gcloud compute forwarding-rules describe http-to-https --global --format json | jq -r ".IPAddress")
echo $IP_ADDRESS
Confirming the infrastructure is ready
Before sending traffic to the load balancer, we need to confirm that the instances are up and healthy. We can do that by running:
gcloud compute backend-services get-health http-to-https --global
If the instance(s) are healthy you should see output of the form:
backend: https://www.googleapis.com/compute/v1/projects/GCP_PROJECT/regions/us-central1/instanceGroups/http-to-https
status:
healthStatus:
- healthState: HEALTHY
instance: https://www.googleapis.com/compute/v1/projects/GCP_PROJECT/zones/us-central1-c/instances/http-to-https-b3nf
ipAddress: 10.128.0.3
port: 80
kind: compute#backendServiceGroupHealth
Let’s test it out!
curl -s -I http://$IP_ADDRESS/
This should output a response of the form:
HTTP/1.1 307 Temporary Redirect
Server: nginx
Date: Wed, 14 Feb 2018 16:50:37 GMT
Content-Type: text/html
Content-Length: 180
Location: https://35.201.87.75/
Via: 1.1 google
Note the Location field is what we would expect. We can also test with an HTTP Host header:
curl -s -I -H "Host: www.example.com" http://$IP_ADDRESS/foo?bar=bazHTTP/1.1 307 Temporary Redirect
Server: nginx
Date: Wed, 14 Feb 2018 16:52:40 GMT
Content-Type: text/html
Content-Length: 180
Location: https://www.example.com/foo?bar=baz
Via: 1.1 google
Job done.
Load testing
Now that we have functioning infrastructure up and available, it is time to put it through its paces. Remember that we set the max rate at which any one instance can receive requests at 100. If more traffic hits this load balancer, it should automatically bring up new infrastructure to handle the traffic. Conversely if the traffic drops below that threshold, infrastructure should be removed safely (aka no impact to the production traffic).
There are some excellent load testing tools available (indeed Real Kinetic’s Beau Lyddon has an excellent 2 part series on running Locust). I only needed to send a static set of traffic at a specified rate to the GCLB endpoint. Tools like siege and ab (Apache Bench) allow for a set number of requests and a concurrency but the actual rate of traffic will very much dependent on the time to complete each request. They do not have the ability to set a max requests per second. I found a neat library/cli that runs on node — https://github.com/alexfernandez/loadtest. For this simplistic case, it was exactly what I needed.
loadtest --rps 200 http://$IP_ADDRESS/
I ran 2 copies of this command from different geographical locations (one in the US, one from Europe).
This graph is available from the Google Cloud console. You can see that I started sending 400 RPS at around 15:40. At about 16:00 I killed one of the processes so that only 200 RPS was being sent. At around 16:25 I killed the load test and the RPS for the service dropped to zero.
This graph is available from the instance group page on Google Console.
Interesting points of note:
- At 15:30 I created the infrastructure and GCP scaled up to 1 instance.
- At 15:40, when the load test began, GCP immediately reacted and spun up instances to handle the incoming wave of traffic.
- At 15:50 GCP decides that the load has finished spiking and actually scales down the number of instances running even though the load is static.
- At 16:00 the load balancing capacity drops as it notices the RPS halved. It takes GCP a while (around 10 minutes) to scale the instance group down accordingly to 5 instances.
- At 16:25 the load test is killed and the load balancer notices immediately.
- At 16:40 (ish) GCP scales the number of instances down the minimum (1).
Wrapping up
To the newcomer to GCP (or indeed any other major cloud provider), that may seem like a lot of work to get a simple HTTP to HTTPS service up and running. They, of course, would be correct! However, the power of the cloud is not having to manage the infrastructure yourself.
We created a service running on infrastructure that Google has spent many billions building. We created a load balancer that has proven itself to be reliable and scalable. We created a set of auto-scaling instances that we don’t need to manage. It just works.
And the cost? ~$22 per month!
https://cloud.google.com/products/calculator/#id=8f945374-bed4-4df9-b4e1-67d16bbca9eb
Cleaning up
These must be run in order as there are dependencies between the different infrastructure ‘objects’:
gcloud compute forwarding-rules delete http-to-https --global --quiet
gcloud compute forwarding-rules delete http-to-https6 --global --quiet
gcloud compute target-http-proxies delete http-to-https --quiet
gcloud compute url-maps delete http-to-https --quiet
gcloud compute backend-services delete http-to-https --global --quiet
gcloud compute instance-groups managed stop-autoscaling http-to-https --region=us-central1 --quiet
gcloud compute instance-groups managed delete http-to-https --region=us-central1 --quiet
gcloud compute health-checks delete http-to-https --quiet
gcloud compute instance-templates delete http-to-https --quiet
Real Kinetic is committed to helping clients develop great engineering organizations. Learn more about working with us.