Making Sense of Environments
Determining how to structure your environments is a frequent topic of discussion for us. It comes up in almost every client engagement we have and we have needed to make these decisions ourselves many times. We frequently see two patterns. One option is to have everyone, or large groups, bucketed into the same environments. That means your teams will share the same development, staging, and production environments. The other option, of course, is to have smaller groups, possibly each team, own their own development, staging, and production environments.
Unfortunately, there is no obvious correct choice because both options have unique benefits, costs, and complexities. We are generally advocates of fewer environments, particularly where cross-team integrations are needed. Determining how to set up, manage, and configure environments requires understanding your underlying platform(s), product/service architecture, organization structure, development processes, operational processes, compliance obligations, tooling, and cost-management concerns. For established companies, organizational inertia is often the biggest factor, followed by service architecture, underlying platform choice, and compliance requirements.
Before we get into specific suggestions and recommendations, let’s step back and review the purpose of different environments. There are four core dimensions we need to address:
Developer Efficiency: Developers need to be able to safely iterate on their systems and try new ideas with low risk of negative business impact due to outages or issues.
Confidence: There are various intentions which center around the idea of “validation.” We need a way to have confidence in the code we are shipping and to validate integrations between services and bits of code we are releasing.
Security: Most systems store, process, produce, or have access to data that is sensitive or has some type of value. After all, computers compute; programs take an input and produce some type of output. The sensitivity and value of the data a system processes determines the security expectations.
Reliability: Users, internal or external, have certain expectations around availability, quality, and accuracy of our systems.
We need to be able to address all of these concerns: provide developers a place to work, have confidence in our changes and improvements, meet uptime expectations, and ensure our users’ data is secure.
There are many ways to meet these needs. In order to develop our environment story, we need to address each of the above areas of concern. We need developers to be productive and have confidence in our systems while protecting users’ data and meeting their quality expectations.
There are various combinations, but we generally suggest business applications start from the following structure:
Playground / Sandbox / Experiments (Minimal Security):
Purpose: This environment is used by developers to explore new ideas and technology. This is an environment that is used for exploring new architectures, languages, databases, etc. Developers should have no expectation that something running here could ever run in production. It is like a scratch pad to play with crazy new ideas. It should never contain sensitive data.
Oversight: This environment should have only the minimum security controls needed to protect the data stored within. The controls should generally focus on preventing abuse and cost management.
Development (Low Security):
Purpose: This is used for normal day-to-day work that is expected to go to production. Systems should be designed and built in a way that will support the goal of going to production.
Oversight: This environment should have lower security controls than a production environment, but we want it to be production-like in key areas that would fundamentally impact development decisions. It should have sufficient security controls to protect the data stored within, but no sensitive data should be used within this environment.
Staging (Medium Security):
Purpose: This is used for testing and validating systems. It might be used to conduct user-tests on new products or major features in early discovery phases, for running smoke tests, penetration tests, integration tests, and so on.
Oversight: This environment should have security controls appropriate to protect the level of data stored within it and sufficient to highlight deficiencies in support tools and automation — if you are SSHing into boxes to debug, you have more work to do.
Production (High Security):
Purpose: Serve business critical workloads and traffic. This is where the systems our users interact with run.
Oversight: This environment should have security controls appropriate to protect the users data stored within it and sufficient to meet customer expectations.
We suggest setting your security restrictions based on the sensitivity of the data expected to be in each environment. If you are dealing with extremely valuable and sensitive IP, it factors in similarly. The secondary consideration when setting controls is driving developer tooling and automation to meet the needs of the most restrictive environment. That means if developers will have no direct access to production environments, you should drive them to use the same tools they will have in production in lower-restriction environments.
In terms of specific environments, not all of them are required. Ideally the staging environment would be removed. In order to accomplish that, you need great testing, operational tooling, and lifecycle-management practices. Though the playground environment could potentially be rolled into development, we like to keep a distinction. That is mostly for clarity of purpose.
Depending on your development practices, you might actually require some additional environments. For instance, if you conduct user-testing or discovery in a staging environment, we suggest using another environment for things like pen-testing. You will also need to consider different regulatory regions, EU versus North America production, for instance. We often run dedicated EU environments to ensure user data is kept within the same regulatory region as the customer.
In our experience, the key is to drive towards the minimum number of environments possible. It helps minimize cost, it improves your practices, it shifts integrations earlier in the development process, and it forces you to improve tooling. When running many development environments, it is too easy to defer integration pain that will cause problems later. Many environments can too easily lead to “it works on my machine” type issues and pain-driven development practices.
With that context, here is the hard question: how should you structure your environments relative to your teams? One set of environments for everyone or a set of environments per team? This depends on two factors: the ratio of cost-minimization to “developer control” and how your services interact with each other.
We generally suggest services that are highly interconnected share environments. Sharing environments forces everyone to deal with integration pains at development time rather than in production. Separate environments for highly interconnected services leads to a lot of “it worked in my environment” unless you have a very mature API versioning practice supported by API contract testing. If you want to minimize cost, shared environments allow you to bin-pack in order to achieve more cost-effective levels of utilization and other economies of scale.
If you are running many environments, you will need tools that facilitate things such as configuration management and policy enforcement across them. Some cloud vendors are building features that help when running many environments. For instance, Google Cloud now offers a nested account structure that allows you to give each team ownership of their environments while preserving top-level controls around networking and rolled-up billing. In order to scale this, you will need robust automation and monitoring tools to aid in the management of your environments. However, if your tooling provides the ability to deploy to many environments, this can be a highly effective way to tip the balance towards developer efficiency.
Our recommendation is generally to focus on minimizing the number of environments. Running many environments can increase indirect operational “day-two” costs (image management, container-lifecycle management, network management, etc.). We typically use the following criteria to draw the lines: what services will be interacting and working together? Shared environments will cause some additional up-front developer pain because it requires more discipline around API definitions and versioning. However, the downstream environments will be more stable since integration must be dealt with directly at development time. The next criteria we use is minimizing operational costs, which includes both computer resources and human costs.
When planning your environments, you should consider what factors you are optimizing for. Do you want to maximize developer throughput in a loosely integrated system? Do you use sensitive data during development? How important is cost minimization to you? If you’re just migrating to the cloud or moving to a (micro)service-based architecture, which practices do you want to improve and change? Don’t let organizational inertia prevent you from implementing better practices. Many policies exist to solve historical concerns or treat symptoms of poor practices that are better addressed in other ways.
There are many other considerations, but this post walks through the high-level considerations, discussion points, and concerns we take into account. There is no perfect solution because these decisions are all about balancing trade-offs and every organization is different. Architectural patterns, development practices, and tools are constantly evolving in this space. Like everything in software, you should periodically revisit and optimize your environments structure and management practices.
Are you moving to the cloud, implementing microservices, or breaking a monolith up? We help clients work through the associated challenges, such as defining their environment structure. Contact us to learn more.