What is infrastructure as code (IaC) and why I should care?
Introduction
I have a conviction that this term has been used for so long that it’s lost its meaning along the way. It’s not the buzzword it used to be not that long ago. I’ll do my best to describe this topic despite it going out of fashion because it’s as relevant as ever.
Let's start from the very beginning. Imagine you’re running your first serious project in the cloud. It's been running stable for weeks, and it's doing great. At one point, however, you notice that traffic to your app has grown in a very short period. It seems that there suddenly are more people trying to use it. What will you do? Well, go to the AWS console and add another virtual machine to your load balancer, of course. This situation keeps repeating itself because your app is very successful, and people around the world are sharing news about it.
At some point, you realize that doing this is not efficient at all because you spend a lot of time managing your infrastructure in the UI, clicking on all sorts of buttons, and moving between pages. As a developer, you wish there was a way to build infrastructure just like you would build software. You could, for instance, write a description of a VM and define its properties like CPU type, amount of memory, disk type and space, etc. In essence, to be able to deploy infrastructure declaratively.
Well, turns out people had that idea already. However, you may ask...
Why bother?
In my opinion, the biggest gain is consistency. Once described, the infrastructure should stay untouched. Any eventual drift (thanks, entropy!) will be aligned with the specification written in a relatively simple text file. In fact, it doesn't matter if something happens as a result of an event triggered by a user, some kind of failure, or an accident caused by your teammate - with IaC, it should be easy to recover. Think of it as a lightweight snapshot to which you can revert. It’s even easier to do if you use GitOps.
Well-written code carries yet another advantage - reusability. Everything that’s described in the code can be repurposed for future use. Need another environment? Or maybe just a clone of the one that you just created? No problem! If you plan in advance, you can write code for deployments of container clusters, databases, caches, and image registries as modules. Next time you want to build another environment from code, you just reference these modules and pass the necessary parameters. I promise this will save you a lot of time.
How to choose a tool?
There’s an array of different tools that implement infrastructure as code. I don't want to delve too deep into the details because it's not within the scope of this article. The main differences these days lay in the templating used by the tool (i.e., Jinja in Ansible, Embedded Ruby in Puppet and Chef, HCL in Terraform), method of using the configuration (pulling the config from the controlling server like in Chef or pushing the config to the controlled infrastructure like in Ansible and Terraform), and language used to build the tool. You can also expect differences in dealing with secrets, state files, and ways of grouping governed resources.
It's important to say that some of these tools are better suited for configuration management than infrastructure provisioning and vice versa. In other words, not every tool is equally good at creating necessary resources as it is at configuring them. In my personal opinion, Terraform is better at provisioning infrastructure (cloud-based in particular), and Ansible is better at configuring virtual machines.
Nowadays, most of the tools mentioned above support all kinds of environments, including bare metal servers and public cloud resources. It needs to be said that some of those tools were created with certain functionality in mind and, during their lifespan, were extended to support others - Chef’s support for AWS can be one example of this. Another topic is tools that were created as part of a public cloud offering. Tools like CloudFormation (AWS), AWS CDK (AWS, obviously), and Bicep (Azure) can’t be extended to be used elsewhere.
On the other hand, if you’re absolutely certain you’ll never move out of the cloud you’re at, you may consider using them. They’ll be the fastest ones to be updated by the public cloud team when the cloud API changes, so there’s no need to wait for the community to come up with an update.
That said, I believe that no matter what your plans are towards the cloud, Terraform is a safe bet. According to Stack Overflow's annual survey, Terraform was the most popular IaC tool mentioned by the respondents and is definitely a good choice if you’re planning to deploy infrastructure in a popular cloud like AWS or Azure. If you want to deploy to GCP, Terraform is actually Google's recommended tool.
Terraform?
Let me talk a little bit about Terraform and explain why it's so popular these days. It's an infrastructure management tool that uses a declarative approach and HCL (HashiCorp Configuration Language) for writing infrastructure definitions.
Its flexibility is based on the availability of many providers that communicate with external systems like public clouds, APIs, etc. These providers are available both as part of the official distribution and as modules developed by the vibrant Terraform community. That way, the user is able to use a unified syntax to create resources. Implementation details are abstracted away by the provider, which reduces code complexity. Of course, minor differences will depend on the particular environment, but this brings huge engineering value.
On top of that, Terraform stores all of its configuration in a state file. This brings a handful of benefits. Most importantly, it makes accidental deletion less likely because this state file is considered the source of truth for all infrastructure configured by Terraform. Additionally, if we store the state file in an object storage, we can easily share it with our coworkers and manage the infrastructure together.
The most basic Terraform code could look like this:
provider "aws" {
region = "eu-west-1"
}
resource "aws_instance" "instance" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
tags = {
Name = "SoftwareMillTest"
}
}
Above, we're declaring that we want to use an aws
provider that’s used for provisioning resources on AWS. We want to deploy our EC2 instance in the eu-west-1
region. Then, we're telling Terraform to use a specific AMI (Amazon Machine Image) ID, and we would like to use the small t2.micro
type because we're only testing and don't need anything fancy. Lastly, we're telling Terraform to set up a tag Name
with a value of SoftwareMillTest
. After we've passed AWS credentials to Terraform, we can run terraform init
to download the required provider. Finally, we should be ready to execute theterraform plan
. Terraform will show us what it wants to do before actual creation.
iac ➤ terraform plan
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:+ create
Terraform will perform the following actions:
# aws_instance.instance will be created
+ resource "aws_instance" "instance" {
+ ami = "ami-0c55b159cbfafe1f0"
+ arn = (known after apply)
+ associate_public_ip_address = (known after apply)
+ availability_zone = (known after apply)
+ cpu_core_count = (known after apply)
+ cpu_threads_per_core = (known after apply)
+ disable_api_stop = (known after apply)
+ disable_api_termination = (known after apply)
+ ebs_optimized = (known after apply)
+ get_password_data = false
+ host_id = (known after apply)
+ host_resource_group_arn = (known after apply)
+ iam_instance_profile = (known after apply)
+ id = (known after apply)
+ instance_initiated_shutdown_behavior = (known after apply)
+ instance_lifecycle = (known after apply)
+ instance_state = (known after apply)
+ instance_type = "t2.micro"
+ ipv6_address_count = (known after apply)
+ ipv6_addresses = (known after apply)
+ key_name = (known after apply)
+ monitoring = (known after apply)
+ outpost_arn = (known after apply)
+ password_data = (known after apply)
+ placement_group = (known after apply)
+ placement_partition_number = (known after apply)
+ primary_network_interface_id = (known after apply)
+ private_dns = (known after apply)
+ private_ip = (known after apply)
+ public_dns = (known after apply)
+ public_ip = (known after apply)
+ secondary_private_ips = (known after apply)
+ security_groups = (known after apply)
+ source_dest_check = true
+ spot_instance_request_id = (known after apply)
+ subnet_id = (known after apply)
+ tags = {
+ "Name" = "SoftwareMillTest"
}
+ tags_all = {
+ "Name" = "SoftwareMillTest"
}
+ tenancy = (known after apply)
+ user_data = (known after apply)
+ user_data_base64 = (known after apply)
+ user_data_replace_on_change = false
+ vpc_security_group_ids = (known after apply)
+ capacity_reservation_specification {
+ capacity_reservation_preference = (known after apply)
+ capacity_reservation_target {
+ capacity_reservation_id = (known after apply)
+ capacity_reservation_resource_group_arn = (known after apply)
}
}
+ cpu_options {
+ amd_sev_snp = (known after apply)
+ core_count = (known after apply)
+ threads_per_core = (known after apply)
}
+ ebs_block_device {
+ delete_on_termination = (known after apply)
+ device_name = (known after apply)
+ encrypted = (known after apply)
+ iops = (known after apply)
+ kms_key_id = (known after apply)
+ snapshot_id = (known after apply)
+ tags = (known after apply)
+ throughput = (known after apply)
+ volume_id = (known after apply)
+ volume_size = (known after apply)
+ volume_type = (known after apply)
}
+ enclave_options {
+ enabled = (known after apply)
}
+ ephemeral_block_device {
+ device_name = (known after apply)
+ no_device = (known after apply)
+ virtual_name = (known after apply)
}
+ instance_market_options {
+ market_type = (known after apply)
+ spot_options {
+ instance_interruption_behavior = (known after apply)
+ max_price = (known after apply)
+ spot_instance_type = (known after apply)
+ valid_until = (known after apply)
}
}
+ maintenance_options {
+ auto_recovery = (known after apply)
}
+ metadata_options {
+ http_endpoint = (known after apply)
+ http_protocol_ipv6 = (known after apply)
+ http_put_response_hop_limit = (known after apply)
+ http_tokens = (known after apply)
+ instance_metadata_tags = (known after apply)
}
+ network_interface {
+ delete_on_termination = (known after apply)
+ device_index = (known after apply)
+ network_card_index = (known after apply)
+ network_interface_id = (known after apply)
}
+ private_dns_name_options {
+ enable_resource_name_dns_a_record = (known after apply)
+ enable_resource_name_dns_aaaa_record = (known after apply)
+ hostname_type = (known after apply)
}
+ root_block_device {
+ delete_on_termination = (known after apply)
+ device_name = (known after apply)
+ encrypted = (known after apply)
+ iops = (known after apply)
+ kms_key_id = (known after apply)
+ tags = (known after apply)
+ throughput = (known after apply)
+ volume_id = (known after apply)
+ volume_size = (known after apply)
+ volume_type = (known after apply)
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
As you can see, most of the fields will be automatically filled by Terraform after we run terraform apply
. To clean up, we can run terraform destroy
, which will only delete the resources that we have in the state file.
Don't delay, introduce IaC today!
There’s never a bad time to introduce infrastructure as code to your team. It surely is much easier to start development with it from the beginning, but the sooner you do the better. At SoftwareMill, we treat IaC as a standard practice that allows us to develop efficiently. If you have any questions, feel free to contact us.