While working on a cool project at Cloudreach, I stumbled upon Varnish, and fell in love with it instantly. The first thing I tried to do was to combine Varnish with the awesomeness provided by AWS Elastic Load Balancer (ELB), in a combination which looks like:

While the frontend ELB works out of the box with Varnish (no surprises here), the backend ELB doesn’t work as expected with Varnish. The problem lies on the fact that Varnish is resolving the name assigned to the ELB, and it’s caching the IP addresses until the VCL get’s reloaded. Because of the dynamic nature of the ELB, the IPs linked to the cname can change at any time, resulting in Varnish routing traffic to an IP which is not linked to the correct ELB anymore.

The problem is discussed here and here but after Googling around I couldn't find any solution which didn’t involve doing:

ELB -> VARNISH -> NGINX (or HAproxy) -> ELB -> AUTOSCALING GROUP

Going through so many layers seemed too much, taking into consideration that Varnish can be used to load balance requests and perform health checks on the backend nodes without the need for an Internal ELB. The more I thought about it, the more I realised how simple it would be to implement a solution..... so I did it. Using Varnish to perform the load-balancing, removes the overhead of going through an internal ELB, and it will require reloading the backend nodes only when an autoscaling activity takes place.

The solution I've implemented uses varnishadmin command line tool, boto, and some bash scripting to glue all together.

First of all we need to get the backend nodes configured in Varnish and store them on a file:

varnishadm -T $HOSTPORT -S $SECRET backend.list > varnish_ips

Then, we will have to query the autoscaling group, and update the backends if any instance has been added/terminated. The following Python code does most of the job:

Let’s break it down:

get_autoscaling_ips gets the IPs associated with instances added to a specific autoscaling group.

get_varnish_ips loads the backend IPs in a Python array

update_vlc_file compares the two list of IPs. If there is any difference (you might want to reconsider this aspect) in the two lists of IPs, it creates a new VCL file containing the IPs retrieved from the autoscaling group.

In order to decouple the VCL section which is used to define request handling and document caching policies (unlikely to change according to the autoscaling group) from the section which is used to configure the backends, the Python script outputs the new VCL in the following format:

include /etc/varnish/healthcheck.vcl;

node definitions

director definitions

include /etc/varnish/use.vcl

The node definition and the director definition is dynamically generated by the script, while healthcheck.vcl is a static file where the healthchek conditions are defined (what a surprise:) and use.vcl is another static Varnish config file, which makes use of the director definition.

Once the new VCL is generated, it’s just a matter of reloading it, running:

varnishadm -T $HOSTPORT -S $SECRET vcl.load $NAME $FILE
varnishadm -T $HOSTPORT -S $SECRET vcl.use $NAME

Something I noticed when creating the script, is that backend.list returns the list of the configured backends, regardless if the VCL which defines them is in use or not. This behaviour makes the all exercise of comparing VCL backends with autoscaling IPs useless, so we need to remove all the previous VCL configs running:

varnishadm -T $HOSTPORT -S $SECRET vcl.discard $OLD_VCL

The three scripts can be glued together on a bash script which runs as a cron job on each Varnish server. The code above has not been used in production yet, so please do test thoroughly before usage. II’m always curious to hear of any feedback, so get in touch if you have any comments on this.

As usual, please reach out to us if you need any help or advice using AWS!

Nicola Salvo

System Developer

4 comments:

Ian McDonald said...: Great stuff! I love Varnish for the ability to override the nocache directives. Used with great effort on older Drupal and other web servers to massively take the load off the back end.; 7 January 2013 at 19:43
Keith said...: Great article. Another viable option would be to have auto scaling post to an SNS topic with listeners that rewrite and reload vcl on message received.; 7 May 2013 at 03:19
Unknown said...: which version of varnish did you write the script for?; 22 August 2013 at 23:53
Unknown said...: Seems like this solution works only for Varnish 3.0.3+ because of the bug https://www.varnish-cache.org/trac/ticket/1141 in 3.0.2 and below.; 30 August 2013 at 11:20

Monday 7 January 2013

Varnish and Autoscaling... a love story

4 comments:

Post a Comment