Beyond Terraform: A Comprehensive Guide to Infrastructure as Code Solutions
The landscape of infrastructure management is as diverse as the organizations it serves. From solo developers tinkering with personal projects to startups scaling rapidly, from e-commerce platforms handling seasonal traffic spikes to enterprise tech giants managing global operations — each scenario demands a tailored approach. The right infrastructure strategy isn’t just about the technology; it’s shaped by the organization’s size, team composition, leadership vision, and past experiences. In this complex terrain, Infrastructure as Code (IAC) has emerged as a powerful tool, but its implementation requires careful consideration of your unique context.
Infrastructure as Code has been heavily promoted in recent years, and it’s crucial to understand its core purpose. At its heart, IAC is about risk mitigation and efficiency. It aims to accelerate our ability to reproduce cloud infrastructure safely and consistently. However, we must be pragmatic about its application in our specific context. For instance, in the early stages of a startup or a personal project, diving into complex IAC tools like Terraform might be premature. It could distract you and your team from delivering core value — shipping an MVP and finding product-market fit. There may come a time when you realize, “this needs to be managed by IAC,” and transitioning at that point will be challenging — but that’s a reality you’ll need to accept.
When is it time for IAC?
This decision is inevitably a judgment call. Unless you need to incorporate compliance standards into your Software Development Lifecycle (SDLC) early on, you must weigh the benefits of quickly standing up infrastructure through click-ops against the long-term advantages of IAC.
Disaster Recovery (DR) is often cited as a reason to implement IAC. However, it’s important to note that while IAC can help quickly recreate infrastructure, it doesn’t solve the problem of data recovery. I’ve observed that the data recovery aspect of DR often takes a backseat or is overlooked entirely.
Team dynamics also play a crucial role in the IAC decision. As multiple teams begin working within the same cloud account, the risk of “orphaned resources” increases. These are infrastructure components created by one team but forgotten or abandoned, potentially inflating your cloud bill. While IAC can help by associating resources with specific projects or teams, it’s not a complete solution to this problem.
The complexity and frequency of your infrastructure changes should also factor into your decision. If your infrastructure is relatively stable and changes infrequently, the overhead of maintaining IAC might outweigh its benefits. Conversely, if you’re constantly evolving your infrastructure, IAC can provide significant advantages in terms of version control and change management.
Lastly, consider the governance aspect. IAC’s value often extends beyond rapid resource creation to the crucial task of managing IAM roles and policies for service accounts. While click-ops often leads to overly permissive access rights, IAC can help enforce the principle of least privilege more effectively. As your organization’s focus on governance increases, IAC becomes an increasingly valuable tool for managing these critical security and access control aspects.
What kind of IAC should I choose?
There’s no universal answer; it depends on various factors. Let’s examine some solutions and their strengths and weaknesses:
Terraform
If you have read my last blog on Terraform you already know my opinion on this. Terraform is the IAC standard across the industry. It has wide adoption and supports nearly everything. It does however have some downsides as do all of these solutions.
Terraform manages its own record of your infrastructure, separate from the actual state within your cloud provider. This approach can sometimes lead to discrepancies between what Terraform thinks exists and what actually exists in your cloud environment, a problem known as “drift” or “split brain.” This makes Terraform particularly well-suited for managing resources that don’t change frequently or unexpectedly, such as S3 buckets or IAM roles in AWS. These types of resources are considered “strongly consistent” because their state is typically stable and predictable. While this characteristic is common to most Infrastructure as Code tools, some alternatives, particularly those designed for dynamic environments like Kubernetes, can handle frequently changing resources more effectively. For instance, tools specifically built for Kubernetes can often manage deployments more smoothly, adapting better to the constantly evolving nature of containerized environments.
Terraform is widely used by companies both big and small, and it may be the best solution for small organizations looking to minimize costs. Terraform Enterprise can also be beneficial for large companies with substantial budgets. However, once you start scaling, problems can arise quickly. Terraform is not ideal if your organization cannot provide a dedicated team to support the Terraform stack and instead uses a DevOps pattern where every engineer contributes to the stack. This approach can lead to version management issues, proliferation of bad patterns through code copying, and challenges with drifted state remediation. This model often results in a great deal of inefficiency as developers can spend a significant amount of time dealing with Terraform as opposed to working on actual product or feature development.
CloudFormation and Google Cloud Deployment Manager
These are the native IAC solutions for AWS and GCP respectively. They’re tightly integrated with their respective cloud platforms, which can be both an advantage and a limitation.
CloudFormation is powerful within the AWS ecosystem but can be verbose and lacks support for other cloud providers. It’s a solid choice for AWS-only shops that want deep integration with AWS services.
Google Cloud Deployment Manager offers similar benefits for GCP users. Like CloudFormation, it lacks multi-cloud support. And while it’s well-suited for GCP-centric projects, it’s worth noting that Google Cloud Deployment Manager receives minimal investment as Google’s focus has shifted to Config Connector (discussed below). Even Google internally recommends Terraform over it. For this reason, I would caution against using it.
Both of these solutions can be verbose, especially CloudFormation and have limited support for custom resource types. They also tend to lag significantly behind the APIs themselves in terms of support in comparison to the Terraform provider modules. As the state is managed internally to the cloud providers the tools are limited when trying to remediate issues with the stack. Unlike Terraform, you cannot just go modify the state.
For the reasons mentioned above, using these products in a DevOps organization may prove difficult due to the frustrations that may arise to those unfamiliar with them. However, I think these tools are primed to benefit from AI to assist the developers as all the context exists in the single file for the most part and all the APIs are publicly available.
Amazon CDK and Pulumi
These tools represent a more programmatic approach to IAC. They allow developers to use familiar programming languages to define infrastructure, which can be a significant advantage for teams already proficient in these languages.
Amazon CDK (Cloud Development Kit) is specifically for AWS and integrates well with other AWS services. The code itself compiles to CloudFormation, so it is particularly strong for AWS-centric projects and teams familiar with TypeScript or Python. Because it is actual code, it’s much easier to write unit tests to assert or validate different parts of your infrastructure configuration.
Pulumi, on the other hand, is cloud-agnostic and supports multiple cloud providers. It’s a good choice for multi-cloud environments or for teams that want the flexibility to switch between cloud providers.
Both tools excel in complex, programmatic infrastructure definitions but may have a steeper learning curve for those new to IAC. While these tools can be used by the average DevOps organization, I think these tools are best suited for platform teams trying to provide a declarative abstraction unique to their organization.
GCP Config Connector and AWS Controllers for Kubernetes (ACK)
These solutions bring cloud resource management into the Kubernetes ecosystem. They’re ideal for organizations heavily invested in Kubernetes and looking to manage cloud resources using the same tools and processes they use for application deployment.
Config Connector and ACK allow you to define cloud resources as Kubernetes custom resources, providing a unified management control plane for both application and infrastructure components. This approach can simplify operations for Kubernetes-centric teams but may not be suitable for organizations not using Kubernetes extensively.
One major benefit of these solutions is that they are great at remediating drift. The control loop pattern of Kubernetes provides a great solution for fixing changes made in the console that differ from those made in the yaml because it periodically reconciles the actual state with the desired state.
However, if your organization is not currently using Kubernetes, going down this path may not be worth the investment due to the level of expertise required around Kubernetes. Google has created GKE Autopilot to solve a lot of the operational overhead, but that can be quite expensive, especially if you are using it just for your infrastructure resources. Another downside of these is that short of making your own custom controllers, these tools are limited to their respective cloud providers. That being said, many infrastructure vendors, such as MongoDB and Confluent, have begun offering operators to manage their products, and the operator pattern has established itself as an industry standard within the Kubernetes ecosystem.
Crossplane
Crossplane takes the Kubernetes-based approach a step further by offering a cloud-agnostic solution. It allows you to define and manage resources across multiple cloud providers using Kubernetes custom resources.
This tool is particularly strong for multi-cloud scenarios and for organizations looking to standardize their infrastructure management across different cloud providers. However, it requires a significant investment in Kubernetes expertise and may be overkill for simpler, single-cloud setups.
To say that Crossplane is complex, would be an understatement. It is incredibly powerful and composable and it is the best fit for the teams that have a large amount of resources to throw at their internal developer platform (IDP). It’s best to think of Crossplane as an SDK for building your IDP.
While Crossplane has a promising future, the frequent changes in its APIs, documentation, and best practices suggest that it is still maturing. It may not be the best choice for those who prefer stability and are unwilling to be on the bleeding edge.
Every IAC solution comes with trade-offs
Choosing the right IAC solution depends on various factors including your cloud provider(s), team expertise, project complexity, and long-term infrastructure goals. While Terraform remains a popular choice due to its versatility, other solutions may better fit specific use cases.
For AWS-centric projects, CDK or CloudFormation might be preferable. GCP users might lean towards Terraform or Config Connector. Teams heavily invested in Kubernetes might find ACK, Config Connector, or CrossPlane more aligned with their existing workflows.
Ultimately, the best IAC solution is one that aligns with your team’s skills, meets your current needs, and provides room for future growth. As with many technology choices, there’s no one-size-fits-all answer. It’s crucial to assess your specific situation, experiment with different tools, and choose the one that best fits your organization’s unique context. Do you favor DevOps and developer autonomy or do you favor standardization and providing a golden path or platform-centric model? It’s important to understand the trade-offs involved with these different approaches.
Remember, the goal of IAC is to enhance your ability to manage and deploy infrastructure efficiently and consistently. Whichever tool you choose, ensure it serves this primary objective without becoming a burden on your development process.
IAC is not just a technical solution, but also a cultural shift. Successful implementation often requires changes in processes, collaboration methods, and even team structures. Be prepared to invest in training and to evolve your practices as you adopt IAC.
Need help sorting out your infrastructure as code options? Real Kinetic specializes in guiding enterprises through these types of challenges. We’d love to chat.