Jenkins on AWS: The journey
TL;DR: Create Jenkins infrastructure based on AWS SpotFleet instances
The main goal of this article is to give a glimpse of the steps taken to create a jenkins instance based on AWS technologies, but also share some headaches throughout the whole process. Most of the decisions taken during the process were made taking into account technology and time constraints.
- Too many concepts
- Architecture overview
- persistence of data
- disk space
- Instance dependencies
Too many concepts
Are you familiar with cloudformation? What about docker? What about ECS? And Jenkins? And LoadBalancers? VPCs? Wow! This journey was quite challenging and overwhelming due to all the technologies available in the market to make it happen, but also the number of new concepts to learn. I had to decide which approach i would take in a time-boxed manner, so i sticked to AWS for a matter of simplicity, comfort and ease to use. Of course, the whole process is arguable and open for changes.
When it comes to architecture, i tried not to reinvent the wheel as much as possible. I tried to inspire myself and take into consideration problems that other people had with this kind of problem. With that i mind, i based myself on a talk made a youtube video made by AWS. It explains extremely well the whole process. You can find the original cloudformation template here.
As this talk only covers how the spot fleet instances helped reducing the overall costs of companies when it comes to CI/CD, it did not cover how jenkins can be secured over the internet, but also how it can be integrated with other platforms.
In my particular case i had to make several adjustments on cloudformation template regarding (1) the AMI ids used, (2) persistence of data, (3) the disk space used for each jenkins slave instance and (4) the installation of dependencies during the instance launch. I will try to cover every section as simple as possible.
In order to get the template up and running, i tried to cover up all the topics that the original template covers: VPC, LoadBalancers, ECS, EC2 SpotRequests, SecurityGroups, UserData. It took some time to understand the base architecture, but in the long run it turned out to super concise and simple.
As soon i tried to search for a suitable AMI id for my particular case, it turned out that AWS was deprecating the image for Amazon Linux AMI. So, i had to decide which AMI for the v2 was suitable for my needs. As i was planning to use docker to run jenkins, the choice was particular easy to make. So, i took the official ones from the AWS Marketplace. I made the same decision for the instances that would run the jenkins slaves. As soon i had the right AMIs for job, i tried to upload the template to cloudformation. I used the dashboard of amazon cloudformation designer to upload my template. The same operation would be possible using the AWS CLI.
Was pretty nice to see all the resources being created. 🎉
Persistence of data
The very first big problem that i had to face and which gave me tons of headaches was “how to persist all data from the jenkins master?”. How could i make the data from the jenkins plugins, authentication methods, cloud configuration persist if some of the spot instances goes offline? Despite of AWS guarantee that there are 5% of chances of the instance goes offline, those 5% seemed too big to take a risk and having to configure jenkins every single time. Fortunately, docker have a proper way to mount custom volumes using drivers. It took me some time to understand the syntax and template configuration to run the docker container with specific volume configurations.
Basically, all you need to add
ContainerDefinitions) and refer to the volume that you are planning to use. Those two properties are part of the schema of AWS::ECS:TaskDefinition. You can check the gist for more info. By default, docker has a local driver in which you can use to run terminal commands (in my particular case NFS). Bear in mind that you might need to configure VPC security groups to allow traffic through several ports.
As soon i configured the plugins to run the project build, i noticed that after running the build a couple of times, the disk space was a limitation. For default, AMI linux instances are instantiated with only 8GBs! As the project where i am working had too many NPM dependencies and it was being executed under a docker image, it turned out to be a major problem during builds. Just the
node_modules and the docker image folder was enough to fill the whole disk (even with a prune 😅. So, i had to figure out a way to attach additional disk space to my spot fleet instances. Again, AWS to the rescue. 💪 Cloudformation provides a way to map specific devices to the current volumes, name also as BlockDeviceMappings. The AWS::EC2::SpotFleet provides a set of launch specifications that will be used to instance the instances. You can check the gist for more info.
Each spot instance fleet created for each jenkins slave needed several dependencies to run specific commands. For instance, i installed docker to be possible to run docker commands inside our jenkins jobs. I faced some issues cause the spot fleet instances start running pending jobs as soon as they are “available”. By “available”, it means that the instance is up and running but that does not imply the existence of installed dependencies at launch. Despite of each spot instance could have their own startup script, it is not synched with the jenkins plugin. Fortunately, the ec2-fleet-plugin has a very nice configuration property called
prefix start agent command, which will prepended to the instance launch command when jenkins tries to access the EC2 machine to verify that the machine is “available”. This solution was enough to have the whole infrastructure up and running. As the plugin documentation states, is recommended that the command is followed by
&& in order to execute the rest of the command (mind the white spaces 😅). If for some reason, you see that your jenkins slaves are not “alive”, try to access the instance (through SSH for instance) and check the startup scripts logs (cloud-init.log inside /var/log folder if the instance is a linux-based one).
This was an extremely important piece of the puzzle, as we were using several AWS services to make it happen. As we were using a plugin to instantiate instances based on the jenkins, a security flaw would imply that attackers would use those instances to harm the whole system. This topic turned out to be a pain in the ass for several reasons. First of all, how i limit the access to the service? How i prevent unauthorized users to use the infrastructure? How can i allow third parties (such as GitHub or Bitbucket) to trigger jobs? They are plenty of solutions to solve this problem. As the team where i am working on is composed by just two people, for the moment being, we implemented a IP-based access, google authentication and HTTPS communication. This allow us to (1) authorize users to use the jenkins server, control the access based on the user IP, but also secure the communication between the network. I would be totally open to discuss a better with everyone.
That’s a wrap! I have a “stateful” Jenkins up and running and ready to execute lots of jobs 🎉. It was pretty satisfying to see the final result and looking back and see all the headaches that the whole process gave. Of course, this is an iterative process. This infrastructure might well fit my needs in an efficient way for the moment being, but it might well require modifications in the future.