AWS Step Functions to accelerate bug bounty recon workflows

AWS Step Functions to accelerate bug bounty recon workflows

Well... I've been talking about how great AWS Step Functions are for over a year, but have struggled to implement a working solution until now. As I work daily in the cloud security space, I think it is easily misconstrued to think that just because you are an AWS expert, you automatically know how to do everything within AWS. This is so far from the truth. Many of the services are just like learning brand new technologies. Sure, some of the foundational elements such as IAM, network, and the interface (or code) are the same but becoming effective at using the 400+ services that AWS offers requires an investment of time per service.

I went deep with AWS Step Functions over the past month and effectively converted my bug bounty recon framework to fully adopt step functions. This enhancement included conversion of all event triggers (mostly S3 file writes/modifications) as well as performing chained API calls through the API Gateway. Now, I can initiate the Step Functions through a simple GET request through the AWS API Gateway backed by a Lambda. All the Lambda does is ingests the parameters and initiates the Step Functions while passing the parameter information which orchestrates the workflow decisions.

If you have not read my prior blog about "Bug Bounty Cloud Automation At Scale", I would advise you to begin there for more detailed background on the overall ecosystem.

The remainder of this article is not a step-by-step guide for AWS Step Functions but instead attempts to fill-in the knowledge and documentation gaps that I struggled with throughout the implementation of the service. Hopefully this will save you hours and hours of time by utilizing the accelerators and caveats that have been identified through the process as well as providing you with some ideas to improve the reliability of your workflows.

How I am using AWS Step Functions

AWS Step Functions has been utilized to fully orchestrate my automated recon workflow. The example that I will demonstrate in this post shows a few common recon steps although the workflow makes it extremely easy to add modular components/tools/APIs into the overall process.

If you have worked with the service previously, you will know the pure joy, exhilaration, and excitement when a workflow completes like this...

18 steps - 60 state changes - 3 callbacks - 15 minutes - 1 successful recon workflow

Successful Workflow
State Changes

However, let's not forget that there were many failures along the way...

For pre-staging public programs, I have a bulk load Lambda function which parses the raw JSON program files from the amazing work of Arkadiy Tetelman  (https://github.com/arkadiyt/bounty-targets-data) and loads them into an AWS DynamoDB table.

To initiate a Recon workflow, all I do is send a GET request to an API that contains two parameters: the program name and the operation that I want to perform. The API has a Lambda backend which queries the DynamoDB table for the scope information and then initiates the Step Functions workflow.

At a high level, the recon workflow does the following:

  • Query the program details
  • Generates the build scripts to only pull existing or dynamically generated program specific files or data as inputs for the tooling.
  • If the scope contains any wildcard values, it will query Rapid7 Project Sonar for subdomains using Athena (more details on this in a previous blog post - https://www.brevityinmotion.com/external-ip-domain-reconnaissance-and-attack-surface-visualization-in-under-2-minutes/). Amass is not currently depicted in the subdomain workflow although I do have it ready for integration.
  • A Lambda will convert the domains into URLs (eventually I will incorporate port scans within this section) and are stored in S3.
  • A Digital Ocean droplet is dynamically created to run HTTPX against all of the discovered subdomains and then the output is uploaded to S3. A callback is utilized with the Droplet so that the Step Function remains paused during execution. The callback token is passed to the Droplet by dynamically creating the execution files immediately before creating the Droplet. After the processes on the Droplet complete and everything is uploaded to S3, the callback is initiated via a shell script that runs the aws cli command and unpauses the Step Function.
  • The HTTPX output is processed and normalized (de-duplication for base URLs). The results of the initial HTTPX run will seed the crawling operation.
  • Another Digital Ocean droplet is generated to run GoSpider with the seed URLs and leverages the same callback approach to pause/unpause the step functions.
  • The URLs from the crawl are parsed, normalized, and uploaded to S3.
  • The URL file is fed back into HTTPX to retrieve and save the responses and fingerprint information for every website that was crawled. This also utilizes a callback.
  • The final HTTPX output is normalized, de-duplicated, and then loaded into a S3 bucket where it is automatically loaded into a Quicksight dataset for visualization and analysis.

Drawing from the overall success of this Recon workflow, I will be developing a content discovery workflow that performs more active assessment as well as analysis against the stored data. Tools will include but are not limited to Ffuf, Sift, and Semgrep.

The best features of AWS Step Functions

After learning and using Step Functions, this is very high on my list of favorite AWS services. Here are some of my favorite features that either made life much easier or have left me with exciting ideas as I mature the usage of it.

  • Workflow Studio - This is a newer feature of Step Functions, and honestly, without this, I am not convinced that I would have been successful with the service. Is it a game changer? Probably.  Step Functions utilizes it's own "Amazon States Language" touted as a "JSON-based, structured language used to define your state machine". I had little desire to learn an additional language (it's easier to find snippets of Python from StackOverflow than it is to find Amazon States Language). As I develop the automated recon workflow, one of the principles is to ensure it is re-deployable via code.  Working directly within the AWS Console would typically be a non-starter beyond troubleshooting or investigating a service. With the workbench, it generates the JSON template alongside the drag-and-drop graphical interface. From there, I could simply update the workflow and then copy the template file into the Github repository.
AWS Workflow Studio
  • Visual Interface - Step Functions provides a visual interface to watch the workflow as it executes. Each step is color-coded in real-time as it executes making it very easy to know where the workflow is at.
  • Input/Output for each step - Not only is the visual depiction valuable, by clicking on each step, the side panel displays the input and output details for each step making it much easier to troubleshoot the variables as they progress through the workflow. Since integration was one of the trickiest and most error prone part, this information readily available improved troubleshooting.
Step input and output details
  • Detailed Error Logs - Creation of the workflows result in many failures. Fortunately, the workflow provides log information readily available below the visual display. It contains error logs, stack traces, and output monitoring available for consumption on a per step basis. Adding to the convenience, for services that utilize CloudWatch logging (i.e. Lambda functions), there is a link embedded to directly open either the Lambda function or the detailed Lambda logs for further insight.
Readily available error logs (yes, troubleshooting step 47 is frustrating)
Embedded links to the Lambda function and CloudWatch logs

The Challenges of AWS Step Functions

There were a handful of challenges faced throughout the learning process. AWS does an excellent job with documentation although it is such an open framework of services and integrations, it is difficult to cover everything. I faced this earlier in the learning process with the variation of processing AWS triggers sent to Lambda functions. For example, finding the code snippets from the event and context parameters to process GET requests, POST requests, and S3 bucket triggers as they are all different. Similarly, my largest challenge by far was:

  • Service Integrations - Writing the code for the service integrations between Lambda and Step Functions. The individual documentation is thorough for Step Functions and also Lambda, but there was limited combined documentation for making the services communicate effectively. Just when I thought I found the perfect example, it would be in something like Node rather than in the Python which was what I needed. There was quite a bit of trial and error, but once a workable pattern of passing information to and from a Lambda function was solidified, the process of reuse was much more simplified.
  • Callback Functionality - Another component was the usage of Callbacks to the Step Functions. This is an extremely valuable component as it pauses the workflow until an out-of-band process completes. It solved prior challenges I had where I did not want a Lambda to be running/waiting (expensive!) for the duration of a web crawl or running amass. Historically, to workaround this issue, I was using AWS S3 triggers that would initiate new Lambdas once a file is written. I am not a fan of S3 file triggers as I learned quickly from a mistake that I immediately caught but still resulted in $40 of damage and have been bitter ever since. The workflow alleviates much of the risk of that occurring (possible to create a cyclical loop, but requires much more intention). Even with the documentation, it was tricky to fully grasp where to obtain the callback tokens and how to effectively leverage them.
  • Saving within Workflow Studio is a two step process - Early on, I lost about an hour of time spent in the workflow studio because it is a two step process to save and I only completed the first step. Step 1: Click "Apply and exit". Step 2: Click Save.
Step 1
Step 2
  • Enabling the Callback function in Workflow Studio auto-enables the Output filter - I spent almost two hours troubleshooting Lambda code because I overlooked this setting after enabling the callback functionality. I am not sure why the filter gets enabled by default as it was breaking the $.Payload parameters due to the filter. Unchecking the output filter box resolved the issues.
Intended functionality
Buried checkbox that was auto-selected after enabling callback.

Tips and Accelerators for using Step Functions

For the remainder of the blog, I would like to share the nuggets of information, code snippets, and core concepts that helped make the adoption and consumption of the service successful.

Initiating Step Functions from a Lambda

The API based Lambda that initiates the Step Functions workflow is provided in the following gist. The most important part of this beyond the actual boto3 code to begin the workflow is the stateInput json information which are the initial parameters passed into the Step Functions workflows as the input to the first step.

Using callbacks (also covering basic inputs)

With my use case, I was utilizing a Lambda function to initiate the technology processes on Digital Ocean droplet servers. While the processes were running on the droplets, I paused the workflow using callback functionality. To add the task token for the callback into the payload in order to pass to the Lambda, the Step Function Payload needs to be converted using the Payload dropdown. It must be changed from "Use state input as payload" to "Enter payload". Since it is no longer automatically passing the input state, you need to define those directly within the text box. For me, I had to define the program, operation, and task token. The task token was not part of the original input state.

The syntax for accessing the task token is: "token.$": "$$.Task.Token". The "token" value on the left can be whatever variable name (JSON key) that you want to assign to the task token value. The "$$.Task.Token" is static and will never change. From the documentation, I often struggled to determine which parts were dependent or unique to the environment vs always static. Your payload should look similar to the following:

The program and operation lines are unique to the inputs values. Those were my previous input values "program" and "operation". The "$." will always go in front of whatever input variable name that you are using. Make sure to use two "$$." for accessing the Task Token.

Now, what does the Lambda see on the other side of this? This step invokes a Lambda called "brevity-operation-httpx" so let's take a look at the Lambda function.

The outputs from the Step Function can be retrieved and assigned to variables within the Lambda using the following as the information is contained within the Lambda event parameter:

  • operationName = str(event['operation'])
  • programName = str(event['program'])
  • taskToken = str(event['token'])

Once you have the Task Token, you need to utilize it. The below example builds an entire shell script and saves it to a S3 bucket which is then synchronized with the Droplet server and executed as the final step of the process. The most important piece to this is the aws cli syntax.

stateInput='{"program":"' + programName + '","operation":"' + operationName + '","statusCode":200}'
aws stepfunctions send-task-success --task-token {taskToken} --task-output '{stateInput}'

The stateInput variable is important as it will provide the output information of the current step and the input information for the next step in the workflow. I had trouble figuring out how to make the inline JSON work directly in the cli command so I manipulated it in the variable using an embarrassingly novice approach of adding in my own quotes/etc. whereas I'm sure there is a command like json.loads or json.dumps that would accomplish this much more cleanly.

One caveat that I was stuck on for a while was that you still need to provide authentication with the step functions callback which is where you can see me retrieving an AWS Access Key/Secret Key from AWS Secrets Manager and then adding it to the script (avoiding the need to hardcode it within source code management although it is written to the script).

If you need to manually troubleshoot the aws cli command, the Task Token can be retrieved from the execution event history logs of the running state machine and you can attempt to run the cli command directly yourself to see if it will progress the state.

I have barely scratched the surface with the options, integrations, and flow selections available. The entire recon workflow is built only using: the AWS Lambda Invoke action and the Choice, Pass, Success, and Stop flow options. All of this is made from drag-and-drop onto the diagram and then completing the supplementary form with the choice, naming, integration, and payload information. Although I have integrations with numerous other native services, I am not sure how locked in I want to be with Step Functions vs keeping the integrations primarily limited to Lambda in combination with Python Boto3. This generalized approach balances the benefits of Step Functions with the agility of moving the functions elsewhere without needing to transform it from the Step Functions JSON language.

Other useful features

I am very excited to track the metrics for the workflows as I begin using them more extensively. Since the timing for each state change is captured, I can track exactly how long recon takes for each program as well as how long each tool/technology takes to complete.

Next Steps

I will continue to build and develop these workflows and will share progress as it is made. I was initially going to add additional post-processing steps into this workflow, but have decided to build the content discovery workflow separately so that they can be more independent, run more concurrently, reduce the complexity of this workflow, and maintain modularity between recon and content discovery. The content discovery idea was inspired by Nahamsec's most recent Live Recon session with @zseano (https://www.youtube.com/watch?v=8Sqp_kryB4E&t=2375s) which I highly recommend checking out. Another principle that I attempt to stay true to is discussed within Daniel Miessler's "Mechanizing the Methodology" framework which I also highly encourage checking out.

If you've enjoyed this, feel free to follow me on Twitter @ryanelkins or subscribe to the blog for future content. Thank you!