Jenkins on AWS: The journey

TL;DR: Create Jenkins infrastructure based on AWS SpotFleet instances

6 min readJan 28, 2021

Hey, its Ricardo again 👋. Don’t miss out on the latest articles! Subscribe to the newsletter today or hit the clap button. Really appreciate it. Stay tuned for more.

Introduction

The main goal of this article is to share my journey on the creation of a Jenkins instance based on AWS technologies, but also share some headaches throughout the whole process. Most of the decisions taken during the process were made taking into account technology and time constraints. Bear in mind that this was a experimentation project. Chances are that by the time you’re reading this, this approach is totally deprecated. Nonetheless, most probably, there are a few things that could be done in a different way, or even there are security detail that should be addressed.

Overview

Too many concepts
Architecture overview
cloudformation
ami
persistence of data
disk space
Instance dependencies
Security

Too many concepts

Are you familiar with cloudformation? What about docker? What about ECS? And Jenkins? And LoadBalancers? VPCs? Wow! This journey was quite challenging and overwhelming due to all the technologies available in the market to make it happen, but also the number of new concepts to learn. I had to decide which approach I would take in a time-boxed manner. My final decision was to take AWS for the sake of simplicity, comfort, ease of use. It’s all about trade-offs.

Architecture Overview

When it comes to architecture, I tried not to reinvent the wheel as much as possible. I tried to inspire myself and take into consideration problems that other people had with this kind of problem. With that in mind, I based myself on a talk made by AWS. It explains extremely well the whole process. You can find the original cloudformation template here.

As this talk only covers how the spot fleet instances helped reducing the overall costs of companies when it comes to CI/CD, it did not cover how Jenkins could be secured over the internet, but also how it can be integrated with other platforms.

In my particular case I had to make several adjustments on cloudformation template regarding (1) the AMI ids used, (2) persistence of data, (3) the disk space used for each jenkins slave instance and (4) the installation of dependencies during the instance launch. I will try to cover every section as simple as possible.

Cloudformation Template

In order to get the template up and running, I tried to cover up all the topics that the original template covers: VPC, LoadBalancers, ECS, EC2 SpotRequests, SecurityGroups, UserData. It took some time to understand the base architecture, but in the long run it turned out to super concise and simple. Although cloudformation is good to get a view of the blocks used to deploy a whole infrastructure, I could have done it in a different way. For instance, I could have used AWS CDK as it give a more simple view of the build blocks used for the solution overall. Besides, is way better than reading a JSON or a YAML file.

AMI

As soon I tried to search for a suitable AMI id for my particular case, it turned out that AWS was deprecating the image for Amazon Linux AMI. So, Ihad to decide which AMI for the v2 was suitable for my needs. As I was planning to use docker to run jenkins, the choice was particular easy to make. So, I took the official ones from the AWS Marketplace. I made the same decision for the instances that would run the Jenkins slaves. As soon I had the right AMIs for job, I tried to upload the template to cloudformation. I used the dashboard of amazon cloudformation designer to upload my template. The same operation would be possible using the AWS CLI.

Was pretty nice to see all the resources being created. 🎉

Persistence of data

The very first big problem that i had to face and which gave me tons of headaches was “How to persist all data from the Jenkins master?”. How could I make the data from the Jenkins plugins, authentication methods, cloud configuration persist if some of the spot instances goes offline? Despite of AWS guarantee that there are 5% of chances of the instance goes offline, those 5% seemed too big to take a risk and having to configure jenkins every single time. Fortunately, docker have a proper way to mount custom volumes using drivers. It took me some time to understand the syntax and template configuration to run the docker container with specific volume configurations.

Basically, all you need to add MountingPoints (inside ContainerDefinitions) and refer to the volume that you are planning to use. Those two properties are part of the schema of AWS::ECS:TaskDefinition. You can check the gist for more info. By default, docker has a local driver in which you can use to run terminal commands (in my particular case NFS). Bear in mind that you might need to configure VPC security groups to allow traffic through several ports.

Disk Space

As soon I configured the plugins to run the project build, I noticed that after running the build a couple of times, the disk space was a limitation. For default, AMI linux instances are instantiated with only 8GBs! As the project where I was working had too many NPM dependencies and it was being executed under a docker image, it turned out to be a major problem during builds. Just the node_modules and the docker image folder was enough to fill the whole disk (even with a prune 😅). So, i had to figure out a way to attach additional disk space to my spot fleet instances. Again, AWS to the rescue. 💪 Cloudformation provides a way to map specific devices to the current volumes, name also as BlockDeviceMappings. The AWS::EC2::SpotFleet provides a set of launch specifications that will be used to instance the instances. You can check the gist for more info.

Instance Dependencies

Each spot instance fleet created for each Jenkins slave needed several dependencies to run specific commands. For instance, I installed docker to be possible to run docker commands inside our jenkins jobs. I faced some issues cause the spot fleet instances start running pending jobs as soon as they are “available”. By “available”, it means that the instance is up and running but that does not imply the existence of installed dependencies at launch. Despite of each spot instance could have their own startup script, it is not synched with the jenkins plugin. Fortunately, the ec2-fleet-plugin has a very nice configuration property called prefix start agent command, which will prepended to the instance launch command when jenkins tries to access the EC2 machine to verify that the machine is “available”. This solution was enough to have the whole infrastructure up and running. As the plugin documentation states, is recommended that the command is followed by && in order to execute the rest of the command (mind the white spaces). If for some reason, you see that your Jenkins slaves are not “alive”, try to access the instance (through SSH for instance) and check the startup scripts logs (cloud-init.log inside /var/log folder if the instance is a linux-based one).

Security

This was an extremely important piece of the puzzle, as we were using several AWS services to make it happen. As we were using a plugin to instantiate instances based on the Jenkins, a security flaw would imply that attackers would use those instances to harm the whole system. This topic turned out to be a pain in the ass for several reasons. First of all, ignorance. I had to learn about security in general, but also specifically for Jenkins. First of all, how I limit the access to the service? How I prevent unauthorized users to use the infrastructure? How can I allow third parties (such as GitHub or Bitbucket) to trigger jobs? They are plenty of solutions to solve this problem. As the team where I am working on was composed by just two people, for the moment being, we implemented a IP-based access, google authentication and HTTPS communication. This allow us to (1) authorize users to use the jenkins server, control the access based on the user IP, but also secure the communication between the network. I would be totally open to discuss a better with everyone.

Conclusion

That’s a wrap! I have a stateful Jenkins up and running and ready to execute lots of jobs 🎉. It was pretty satisfying to see the final result and looking back and see all the headaches that the whole process gave. Of course, this is an iterative process. This infrastructure might well fit my needs in an efficient way for the moment being, but it might well require modifications in the future.