cloud. automation. security. resourcing. learning. sustainability. exhaustion.
These are all words that describe thousands of hours over the past year related to cloud security. This blog post will articulate:
- enterprise cloud security challenges
- lessons learned
- predictions into what will set companies apart in security posture
- a part 2 article that will provide information for current and upcoming security practitioners to prepare themselves for the next few years of talent demand.
I began focusing on cloud security about 8 years ago and the nature of my role at the beginning to what it looks like now is not recognizable. Stemming from single deployments and expanding into hundreds of global environments as well as the evolution of serverless options, pipelines, managed services, and open source content has added significant complexity to the field.
Often I've referenced cloud as a destination which is contrarian to the common "journey" it is described as. Why is that?
Learning "cloud" is like wanting to learn "technology". Depending on your company's adoption and use cases for cloud, it could be purchasing hundreds of niche Software as a Service (SaaS) providers, it could be extending a virtualization cluster into an Infrastructure as a Service (IaaS) provider to support a lift-and-shift initiative, or commonly it could reference a large modernization effort to adopt a full service IaaS/PaaS/SaaS provider such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP). The remainder of this will focus on the latter use case of modernization, DevSecOps, and automation.
Cloud is not a dependency for automation.
Cloud is not a dependency for integration.
Cloud is not a dependency for containers.
Cloud is not a dependency for virtualization.
Cloud is not a dependency for analytics and measurement.
For most security related use cases, there is nothing stopping us from auto-remediation on premise (we do that to a degree with group policy). Many of our security software solutions have existing API integrations which are available for automation and integration. We have been virtualizing servers and running Docker locally for years. Why is cloud different?
Cloud provides the features and services to be successful in automating, scaling, and integrating security components into an entire event-driven ecosystem.
My perspective on cloud has shifted extensively over the past few years. When approaching cloud security for an enterprise, the most common depiction is the following:
This visualization is the "security by design" model where applications must incorporate the controls as part of their own design and not rely on bolt-on capabilities because they are now beyond the walls of the "on premise perimeter". I've only found this to be partially true.
Then I took this belief a step further to a multi-cloud ecosystem that looks like this:
In the cloud, there are all of these individual, modular applications communicating between clouds, on premise, and within the same cloud. This is often where vendor marketing highlights that their single solutions could solve cloud security for any cloud.
This brings us to now, where I believe the following to best describe the state of cloud security. If you are beginning to adopt cloud or if you are already operating within a cloud environment, I am curious whether this resonates with you.
What I have found at least with the large cloud providers, we have never truly moved away from the "bolt-on" security of on premise. Every cloud provider has a base set of foundational pre-work that needs to be implemented to support workloads. This includes numerous security related services that teams must learn, configure, and maintain. The further up-the-stack (IaaS --> PaaS --> SaaS being the furthest up the stack), there is less to do from the customer side. However, it is difficult to avoid at least some level of IaaS within the major providers and the full set of foundational security controls are ultimately inevitable. Since some level of office space will often exist, varying foundations are now necessary across on premise, cloud 1, and cloud 2. Once implemented, they must be operationalized and supported, yet by whom? At the same time, application and development teams need to incorporate service specific hardening configurations unique to their use case to maintain a security by design model.
A robust cloud foundation typically utilizes the following components and can be unique implementations for each provider.
Challenges of multi-cloud environments
The challenge of cloud vs on premise is that the security fundamentals-- firewalls, authentication, authorization, monitoring, and logging are common across all yet deployed differently. Each provider is nuanced and has a unique set of security services, guardrails, and configurations that must be implemented. Another complexity is the shared responsibility. However, I'm not referring to the shared responsibility model between cloud provider and cloud consumer. This is a secondary shared responsibility model adding to complexity for the cloud developers where it is important to understand what the internal core cloud engineering teams provide by means of security and the expectations that the cloud software and development teams need to embed directly into their often full stack solutions.
I wish I would have done more with the internal shared responsibility model sooner as reality demonstrated that teams were often learning on the fly (just as the security and infrastructure teams are) and naturally, they infer that security is already covered in case they introduce a misconfiguration.
Misconfigurations are what keep me up at night.
The cloud principles, strategy, and objectives are typically the same, but it will require dedicated technology specific training time, hands on learning, and focus time for an engineer to learn a secondary cloud provider. Ultimately, it is unrealistic for a cloud engineer that is an expert in one cloud provider to jump to the other and hit the ground running. These are some of the outcomes that typically play out:
- Just do both (absorb the effort): Often, the expectation to pickup a new cloud provider is on an already stretched cloud team providing support to the current environment. It will require tradeoffs and prioritization that will likely impact maturity for coverage. The common maturity elements for cloud adoption are cost automation/optimization (i.e. resource clean-up, right-sizing, off-hours development shutdowns), infrastructure as code (CloudFormation/CDK/SAM/Azure Resource Manager/Google Deployment Manager or is it time to pickup a cloud abstraction language such as Terraform to provide consistency across providers), security auto-remediation, and further analysis of services that were turned on but never touched (i.e. Access Analyzer, Config, Personal Health Dashboard, CIS baseline benchmarks). The other piece of this is that these tradeoff components are often the primary benefits, advantages, and exciting components of cloud that will draw the interest of ambitious talent and lead to industry leading program transformation.
- Knowledge - breadth vs depth: There are so many services in each provider that practitioners can easily make a career in a single cloud. Is the mental drain and increased workload worth it to the engineers required to support multiple clouds? Avoiding burnout is extremely important while maintaining job satisfaction, or it may lead to the next reality.
- Pay to play: Opportunities abound for cloud experts even with knowledge of one provider. Factoring the remote landscape expands opportunities for experts in locality restricted roles and can be damaging to less flexible or hostile environments. It is not uncommon for recruiting companies to easily top existing salaries by large percentages. The grass is not always greener on the other side, but when teams are operating at burnout, that leaves much more upside for the new opportunity.
- Service sustainability: The velocity of new services and capabilities released is difficult to maintain even with just one provider. Who is triaging service consumption to ensure appropriate guardrails? Who is reading the updates across hundreds of service updates to make sure new features do not inadvertently expand the attack surface? Is there a team to call if there is an issue with a service that has been enabled? Does that team know they are accountable?
Cloud vendor curveball
Now that we have enumerated a support, resourcing, and sustainability challenge with cloud, this is where the vendors come in to save the day. Just buy X product from Y provider and it will secure the cloud. First off, some of the solution prices are astronomical. Information Security has had a shelf-ware problem for the past decade, where technology and purchase budgets have increased, leading to untouched licensing expiration, unused capability, and increasing numbers of data breaches. Management: I thought we needed to move faster, why would we not want this product?
This is where I think there is another technology challenge. There are some absolutely amazing cloud security companies and startups. Some of them effectively solve small niche problems that offer tremendous benefit. It is an unfortunate reality for many of these companies that it is purely just a resource limitation on the buying side. My theory that makes this a losing proposition for many of these exceptional niche startups are that for core security teams, cloud has introduced tens of new native services that must be enabled as part of the foundation. These services need ownership, maintenance, and support. Therefore, these cloud native services are going to take first prioritization of available bandwidth. Although beneficial, the core teams are treading water and cannot purchase more technologies purely due to capacity. Or if it is purchased, it won't be utilized within the first year, leading to an immediate sunk cost (maybe this is just my pessimism) because what must occur before any technology investment is a control objective or use case that the technology must solve. If this process goes in reverse, it will take the first year to sort through and create projects to solve the unplanned use cases.
Living off the cloud
What the majority of cloud security products do at their core is integrate with the cloud provider's APIs. Therefore, the practitioners running these tools need further underlying knowledge of the cloud services to mature and tune capabilities. To some degree, cloud services are opinionated and there is a finite number of methods to interact and control. Although the providers have clis, sdks, and consoles, typically these are also abstractions to the core APIs.
Out-of-box security products are missing context without tuning, enrichment, and underlying practitioner knowledge of the services and architectural patterns that they are protecting. Thus, even with vendor acceleration, it is only a compliance checkbox until context is introduced. Furthermore, there is a thriving open source catalog of solutions available that also utilize the core APIs in order to extrapolate insights and information. Some examples worth investigating for augmentation include:
- CloudMapper - https://github.com/duo-labs/cloudmapper
- Parliament - https://github.com/duo-labs/parliament
- Cartography - https://github.com/lyft/cartography
- Cloudsplaining - https://github.com/salesforce/cloudsplaining
- Cloud Guardrails - https://github.com/salesforce/cloud-guardrails
- Cloud Custodian - https://github.com/cloud-custodian/cloud-custodian
- Pacu - https://github.com/RhinoSecurityLabs/pacu
- Cfn-nag - https://github.com/stelligent/cfn_nag
Overall, it is not that these third-party tools do not have a place in the ecosystem, but it is important to not prematurely "buy" security add-ons. There is an extensive engineering opportunity within the realms of the unique cloud foundational environments and services mentioned earlier that can completely transform, modernize, and establish a risk-based security program.
Everything translates into one common plane - cloud native APIs.
Living off the land has been a highly effective strategy utilized by adversaries for years. Their toolsets comprise of modular scripts, bash, cli, reusable payloads, and automation tuned specifically to the target environment. Similarly, these cloud APIs provide defenders with an opportunity to engineer and scale a highly tuned ecosystem of services to provide real-time telemetry, detection, and response to a degree that has never historically been available.
Attackers and defenders are constrained to the same attack surface - cloud native APIs.
This is fascinating.
Sure, there is custom code and operating systems non-specific to cloud that introduces additional variables to the environment, but in the context of pure cloud security we have the same base functionality available to both sides. Now the advantage is based on how we utilize the data, insights, and functionality available.
What we need is effective cloud governance.
For every cloud guardrail and requirement, we can approach every control in up to 4 different ways.
- Directive - Published baselines that can be consumed by cloud developers for secure implementation of cloud services. These will initially be published on an internal wiki although at an increased maturity level, these would be available as linting checks within the developer IDEs to detect vulnerabilities pre-deploy (shift-left).
- Preventive - Secure design patterns, guardrails, and authorization boundaries that restrict cloud functionality or prevent insecure configurations.
- Detective - Codified configuration management checks that can evaluate the security posture of the resources instantiated from the services and signal non-compliant configurations.
- Responsive - Upon detection, the automated remediation or response actions initiated against a non-compliant resource (shift-right).
Some services may have 15-20 hardening controls while others may have 2-3. This triage can become tedious yet necessary. Additionally, advice I have always given is that to effectively secure a service or software, a base understanding of its functionality, capabilities, and contextual usage is necessary to apply risk-based, relevant controls. This is often a team effort of collaboration with the consuming development team. Getting this governance capability established is key to the scalability and velocity of secure cloud adoption.
Buying security has never worked. Investing in the development and well-being of our talent will pay dividends for the future of the security industry.
I am optimistic that cloud provides an opportunity for security programs to innovate far beyond their traditional capabilities. A primary differentiator for effective security programs will be the investment into capabilities tailored to their ecosystems in lieu of off the shelf add-ons.
The modern ecosystems are an interwoven mesh of services, apis, software, and process. The core software or applications may be 10% of the overall architecture whereas the remainder are the processes surrounding infrastructure as code, deployment pipelines, services such as Step Functions workflows, S3 data storage, Lambdas for automation/unit tests, ETL processes to transform outputs into actionable metrics and intelligence, and log monitoring tuned for security and operational availability model. The future of solutions is not limited to a single compiled executable, but a mesh of integrations to provide end-to-end automation, scalability, and reporting. If done correctly, the closest manual interaction for app support with the production deployment (besides reporting, metrics, notifications, dashboards) is via a GitHub pull request.
The barrier is that there are new skills, behaviors, and mentalities to be adopted by our teams. There is too much to do for two or three ambitious engineers (or the single voluntold "cloud person" at a company) to take on themselves. Success requires building new knowledge and challenging the status quo. It will require leadership to carve business hours for training, learning, ideation, as well as the freedom to fail. It will require open, inclusive, and internally and externally collaborative relationships to drive, motivate, and optimize success. Just as importantly, we have to balance the workloads of our teams to ensure that we are not running at the expense of our mental health. This means that as much as our builders, developers, engineers, architects, and analysts have to transform, our leadership and management must also transform to provide training opportunities during work, empower teams to create, and protect our teams from burnout.
How does a security practitioner begin to learn the skills to modernize, automate, and scale a cloud security program?
There is no easy button. I've often found that there are extensive resources on how to secure services yet there is a gap around how to utilize these services for security. I will share a collection of knowledge from the learning journey that I have taken over the past few years which has tremendously increased my effectiveness to build and implement modern security vision. I will share some of the "art of the possible" with cloud automation related to security. Part 2 will be a starting point to become comfortable working with:
- Data science concepts, notebooks, and datasets.
- Automation, integration, and scalability.
- Programmatically interacting with APIs.
- Data enrichment and presentation.
- Native cloud service examples and usage.
- Infrastructure as Code (IaC), GitHub, pipelines, deployments.
- Continuous measurement, reporting, and dashboards.
The draft of Part 2 is in-progress and will be published in the coming week. It is an extension to a talk that I did in 2020 at the Red Team Village June'gle event. While waiting on part 2, I recommend checking out this talk as it introduces the power of Jupyter notebooks in combination with libraries such as Python Pandas and Numpy:
The example notebooks are at: https://github.com/brevityinmotion/straylight/tree/main/notebooks.
Also, if you're interested in topics such as bug bounty and AWS, I recommend checking out a walkthrough on building an automated recon workflow using cloud native capabilities which was initially built on Jupyter notebooks and relies heavily on Python Pandas running within Lambda functions.
There are tons of Python Pandas examples throughout the code at https://github.com/brevityinmotion/brevityrecon.
Thank you for taking the time to read these thoughts and feel free to reach out with any questions, alternative perspectives, or further ideas on how we can mature cloud security together.