Takeaways from the Capital One data breach

Nick Joyce
Real Kinetic Blog
Published in
5 min readSep 13, 2019

In July 2019, Capital One was breached and around 30GB of credit application data was exfiltrated, impacting around 106M people.

There are plenty of sites that can give you an in-depth technical breakdown of how the breach occurred so I won’t go too far into the issue here. This is what we know:

  • A misconfigured firewall unintentionally exposed a service to the public internet.
  • A vulnerability in this service allowed the attacker to execute arbitrary commands remotely. Some signs point to SSRF, but again, the details aren’t specific.
  • By querying the internal metadata service that AWS provides, the attacker was able to gain the credentials associated with the instance that was executing the commands.
  • This IAM role configured for the server was overly broad and allowed access to read and sync from S3.
  • Using these credentials, the attacker then was able to sync the S3 buckets containing PII (Personally Identifiable Information) to their local machines.

Note that this information was gleaned from the case filing. At the time of writing, the case is ongoing, so details are few and far between.

It’s an interesting hack because it requires an in-depth knowledge of the AWS subsystems for security groups, IAM, metadata services, and S3. Thankfully, Capital One had the foresight to enable CloudTrail auditing and spotted the data breach quickly and responded appropriately. Otherwise, the attack could have gone unnoticed or worse, continued.

Lessons learned

Principle of least privilege

This is giving a user or process only those privileges that are essential to performing its intended function. In this case, there should have been a review of all IAM policies, and engineers should confirm whether the IAM roles assigned to each ASG or instance is the minimal set it could be.

We see this a lot when people start with Kubernetes. As applications are first introduced into a cluster, the per-service managing and storage of secrets is burdensome for the engineers and serves to slow them down. To side-step this issue, the underlying nodes are assigned the IAM roles required to make the application function. This works “fine” until more applications are introduced to the same cluster. Now each pod that runs on the nodes will have access to the same set of roles as the original application, even when they are not supposed to — not a great idea.

A best practice here is to provide an IAM policy for the smallest possible context. By context, I mean process, container, pod. Each IAM policy should only contain the minimal roles and permissions required to perform the function. Credentials should be rolled regularly, ideally via an automated system (most internal AWS services do this automatically). Note that auditing is key here. Compromised credentials can be used immediately and for AWS and GCP there is no revocation as such (although you can remove the permissions associated with the credentials). These creds typically have a 1 hour lifetime before they need to be renewed. Rolling of these credentials can also help mitigate ongoing attacks.

Defense in depth

This is an essential part of breach containment. If the public-facing instance’s IAM policy did not have direct access to S3, this situation could have been avoided. This might have required deploying a separate internal service that the requests could have been forwarded to, providing more control and security.

Given the sensitivity of the data, the S3 contents should have been encrypted with a CSEK (Customer Supplied Encryption Key) which meant that even if the data in the S3 buckets were to be exfiltrated, the data would still need to be decrypted making the breach a lot less impactful.

Architecture reviews

Architecture and security reviews are a core part of any application and should be performed regularly (not just the first time it is going to be deployed). Security should be a core part of this process, and a thorough review should be performed on all IAM configuration and networking.

When a change is proposed to modify any part of the infrastructure, a Subject Matter Expert should be added to the code review to ensure the impacts of the change are understood. This includes:

  • Networking topology
  • Firewall rules
  • IAM
  • Any code dealing with authentication/authorization

Automation

The automated deployment of infrastructure and applications is a critical part of maintaining an appropriate security stance. Having deployments and rollbacks of infrastructure managed by third-party systems frees up the engineers to focus on the important stuff, providing business value. Avoiding customers losing confidence in your company due to a breach is a big part of delivering business value.

Auditing and alerting

The complexity of applications and infrastructure grows over time. Doing thorough code reviews before deployment is essential to catch potential issues early, but monitoring your existing infrastructure for changes and alerting on them is crucial. Using services like AWS CloudTrail and GCP Cloud Audit Logs are very valuable but can get noisy. There are tools out there, such as AWS GuardDuty, that can help you to gain visibility into what is potentially significant, continuous monitoring and threat detection.

Some areas to focus on are:

  • Any non-sanctioned IAM/infrastructure changes (see the automation section above).
  • Regular auditing of IAM policies to ensure that the principle of least privilege is being adhered to.
  • Regular auditing of networking topology and firewall rules.

Pen testing

Not often mentioned, but having contracted third-party experts who can attempt to penetrate your systems is extremely valuable. We’ve worked with these companies in the past. When used correctly, they have been instrumental in finding weaknesses in systems that engineers do not have the expertise to figure out.

Ongoing Training

Attack types and vectors change over time and engineers need to be kept up-to-date with security training to understand the methodologies, mindsets, and techniques of attackers to be able to engineer solutions that are going to be resilient. The training needs to be ongoing and with regular feedback loops to ensure that value is being extracted. We’ve found internal hands-on “hacking exercises” are a great way to help engineers develop a deeper understanding and keep them up-to-date.

Conclusion

Securing custom infrastructure and applications is hard. Vigilance needs to be maintained at the engineering, deployment, and monitoring levels to minimize the chance of a breach. If a breach does occur, then there should be multiple levels of defense that the attacker must get through before getting access to anything sensitive.

We deal with these kinds of situations all the time, come talk to us.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in Real Kinetic Blog

Our thoughts, opinions, and insights into technology and leadership. We blog about scalability, devops, and organizational issues.

Written by Nick Joyce

Cloud herder. Code monkey. Wood worker. Husband. Human. Managing Partner at Real Kinetic.

No responses yet

What are your thoughts?