In part one of this series, we started defining the problem we are solving. Essentially, we are trying to leverage Docker and DevOps tools to ensure our decentralized team can release faster and with less centralized synchronisation. Before we dive into software choices, we have to talk about the elephant in the room.
DevOps is a dirty word to some people. It shouldn’t be. Some see it as developers occupying parts of the responsibilities better handled by sysadmins or IT staff. Developers sometimes see it as more unspecified work that takes time from their real duties. If you can afford a full time sysadmin crew (a crew is more than one person, by the way), that’s great! Use their expertise to solve these issues.
In my experience this is pretty rare today. In fact, these issues are often handled by someone who has it as an extra thing on top of other work they are expected to do. If you can point to someone that’s "The Person Who Usually Fix It When It Breaks,” then your team is in this camp. Chances are that you also have a "Release Person” and many other choke-points in your flow as well.
DevOps means taking this extra, often unplanned work and properly defining it as a separate role. In general, it will contain some development and some sysadmin tasks, but all in all be aimed at the environment where your code executes. Releases, server setup, application dependencies — that kind of stuff. At times DBA work as well.
In my experience, the role is often shaped around the abilities and experience of the individuals as well as the needs of the organization.
Making a separate DevOps role also makes it clear that specific knowledge is needed. It will show you that The Person holds a lot of knowledge about your software and platform. Not only will The Person get some credit for their work, but also makes it clear what the actual extra work they’re doing entails.
It will also help you see if this is a risk to your organization. I’ve been in organizations where if The Person gets sick, is traveling or quits, your whole operation is at stake. Having everything laid out, it will now be possible to recruit for this role. You will have to be a bit choosy though, not every developer will cut it, neither will just any sysadmin. If you’re understaffed at this position you run the risk of having a potentially business ending point of failure.
The DevOps knowledge of The Person will often be codified in a personal library of SQL-queries, one-off scripts and other semi-automated solutions (there’s a dev in DevOps, after all). The problem of documenting, standardizing and spreading this knowledge is something DevOps software is there to solve.
A good software suite also makes routine tasks repeatable, executable and debuggable. It should make it possible to handle changes and updates gracefully. We want to be able to use the same process we use for software development here if we can! This is where the DevOps mantra "infrastructure as code” comes from.
Many of the popular tools boast human-readable configuration (documentation), while promoting organization and execution (code) recognizable to software developers. Here’s a couple of things to keep in mind when choosing DevOps software:
- Go with Open Source solutions. This is a new field full of invention and things are moving fast
- Listen to your team and let them decide. If even just one of them have a professional track record with a specific tool, see if you can leverage that and get a head start
- Go agent-less. Don’t go with a solution that require you to maintain (and monitor!) an extra service on your servers. (Buy me a beer and ask about the war stories)
- Choose a tool that can adapt to your current setup. Some tools can’t describe an existing infrastructure and need to be used from the start to be of use later
- Human readable configuration. Observe that this doesn’t mean readable for all humans, it means readable to the DevOps team
- Vendor agnostic. You should be able to reuse most of your infrastructure definitions even if you switch cloud providers or move from bare metal to the cloud
- How is state handled? Some tools look before they leap, perform checks to see if they need to execute a task or not. Others need a shared state file which can turn into a hassle quickly
- Solid documentation
What’s the holy grail? When you find a tool that can present a unified interface to your whole IT infrastructure while still being understandable by humans.
When I started to look at this problem I wanted to use Ansible for everything. In my experience, Ansible is simply the most complete DevOps tool out there. Ansible is special to me for a specific reason: It strikes a great balance between the two parts of dev and ops. It allows me to do super detailed nuts and bolt things like operating system updates and adding users on servers all the way up to glueing together PaaS and IaaS components high up in the clouds.
At the same time, the configurations are fairly readable (YAML). It also has an established set of best practices for how to organize your configuration into components ("roles” in Ansible parlance) you can then reuse when describing the purpose and desired state of a server, virtual or physical.
Countless modules for countless services are available, a huge community using it and active development rounds out the benefits.
This is not the place for a full introduction to Ansible, so here’s a good video introduction.
Unfortunately (for me), even if Ansible is extremely flexible, it still has a basic paradigm that doesn’t always fit: Tasks are mapped onto servers (or groups of servers) from an inventory to reach desired states. This poses a catch-22 when you are building AWS infrastructure.
What if I need to start a virtual server first, and then run tasks on it? What if I want to add a routing rule to a network in a virtual private cloud? There’s no notion of a target host there.
There is a suggested solution to this with Ansible. Connect to localhost and make API-calls. There are modules that work like this for most AWS products, so this is of course doable, but it becomes kludgy pretty fast. On top of that, there were plenty of inactive open tickets on github for the AWS modules when I had to make my decision. Instead of forcing my problem to fit Ansible, I decided to split off the actual infrastructure-part and find a separate tool to handle the AWS setup.
A coworker quickly steered me away from Amazon CloudFormation and pointed me toward Terraform. Many of you are probably already familiar with other tools from HashiCorp, this is their solution for orchestrating cloud based infrastructure.
Terraform is a fairly new piece of software, still under heavy development. It’s not a perfect fit according to the bullet list above and I’ve stubbed my toes against many weird things, but Terraform really shines in one area: Providing a common interface to all the services we need.
It also resolves dependencies automatically, which has been extremely useful to someone like me who never set up AWS infrastructure from scratch before. I could simply start in one end and resolve dependencies as I went along.
Terraform has a custom configuration language which took a bit of time to understand. It has some weird omissions and inconsistencies. For instance, you can’t create a list literal, but there are functions that operate on lists. Variable substitutions will work in some places but not others.
I’ve been able to find answers to my questions pretty quickly (developers seem reasonably good about replying to github issues), but it has taken some time to get used to things.
Terraform configs can be broken up into modules and reused easily. There are also a lot of modules from the community available to look at to get an idea of how to set up your own environment. You even reuse these community modules directly by just telling Terraform where to find them and it will take care of the rest.
In the next edition of this article series, I’ll dive deeper into Terraform and Ansible. We will also look closer at deploying Docker images.