Deploy at Scale

This section provides helpful information for deploying your survey at scale. Up to this point, your application has only experienced a few requests at a time. When you deploy your survey, the application will need to handle multiple requests at once. Depending on the complexity of your configuration and the expected number of users, you may need to adjust computational resources to handle increased traffic at scale. In this section, we walk through the process for load testing your application to see how your application scales.

The following parameters can be tuned to affect the speed of your application.

AWS Lambda Global Parameters

Specify global parameters for your Lambda function in template.yaml

Globals:
  Function:
    Timeout: 600 # Specify time (in s) before function times out.
    MemorySize: 512 # Specify memory (in Mb) allocated to your function.

See the AWS documentation for setting memory and computing power for more details on balancing cost and speed. At the time of writing this, you can allocate between 128 MB and 10,240 MB for your functions.

Check Lambda Concurrency Quota

New AWS accounts may have reduced concurrency and memory quotas. If your function requires more than 3008 Mb of memory, you may need to request a quota increase. Standard applications of BACE should operate smoothly with less than 3,008 MB of memory.

You can also increase the number of machines that can run concurrently in order to scale your application. The default concurrency limit for the number of Lambda containers that are run concurrently is 1000. However, in some regions, the concurrency limit is throttled to a lower value for new AWS accounts.

You can check the quota for the number of concurrent AWS Lambda executions by opening up the Service Quotas console at https://console.aws.amazon.com/servicequotas/home. Select AWS Lambda and check the applied quota value for the number of Concurrent executions. This value controls the maximum number of Lambda containers that can be run simultaneously.

If you anticipate having more people take the survey simultaneously than the number that is listed, you should request a concurrency limit increase.

Application Tuning Parameters

You can also tune the speed/precision of your application by specifying some parameters in /app/bace/user_config.py.

size_thetas = 2500 # Size of sample drawn from prior distribuion over preference parameters.
max_opt_time = 5 # Stop Bayesian Optimization process after max_opt_time and return best design.

# Configuration Dictionary for Bayesian Optimization
# See https://github.com/ARM-software/mango#6-optional-configurations for details
# Possible to add constraints and early stopping rules here.
conf_dict = dict(
    domain_size=1000,
    initial_random=1,
    num_iteration=20
)

max_opt_time - Sets the maximum time to spend on the Bayesian Optimization step. If the optimization step takes longer than max_opt_time, the process is stopped and the design with the highest mutual information up to that point is returned.
size_thetas- This parameter governs the number of points that are sampled from theta_parameters to form your prior distribution. As you increase size_thetas, the number of points used to estimate the posterior increases, and estimates will become more precise. However, increasing size_thetas also increases the computation time required to update beliefs and choose new designs. Thus, users must trade off precision and speed.
Bayesian Optimization Parameters can be set by specifying a conf_dict (dict) object in your user_config.py file. See the Mango documentation for details on how to specify this object. To choose the optimal next design, the Bayesian Optimization algorithm selects initial_random points at random and computes the mutual information at these initial points. For num_iteration subsequent iterations, a Gaussian process is fit to the data; the design that offers the highest expected improvement is chosen, and the mutual information is computed at that design. This observation is added to the model's data, and the fitting process is repeated. Increasing num_iteration will weakly improve optimization performance at the expense of computation time.

Load Testing

We provide a script to help you test how your application will run at scale. We use Locust, an open-source load testing tool, to see how your program will handle multiple users at once. Follow the instructions to install Locust on your local computer prior to working through this section.

The key file that will be used is run_load_test.py, which is located in the root of your directory.

The file defines the /appUser class, which simulates an individual taking the survey. The basic script can be updated to handle additional calls/requests depending on how you have coded your application.

Make sure to set up locust for your environment by following the link above. Start a new load testing session, by running:

locust -f run_load_test.py

Example output:

locust -f run_load_test.py
[2022-06-07 12:59:35,463] DESKTOP-VBEES43/INFO/locust.main: Starting web interface at http://0.0.0.0:8089 (accepting connections from all network interfaces)
[2022-06-07 12:59:35,499] DESKTOP-VBEES43/INFO/locust.main: Starting Locust 2.9.0

The command will tell you how to access the locally hosted web interface, which is typically available at http://0.0.0.0:8089 or localhost:8089, in your browser. Specify the number of users, the spawn rate, and the host website for your application (<your-URL>). Do not include a trailing forward slash, /, when typing the web address. Click Start swarming and a new test will begin. The web page records the response times for each type of request. You can scale to multiple users by clicking Edit and changing the Number of users and Spawn rate.

To exit the locust test in the command line, hit Ctrl + C or click Stop in your web browser. Aggregate timing statistics will be printed.

Note that this process simulates real users, so you will be charged if the number of test calls exceeds the free tier limits for AWS.

Pricing

For typical applications, most costs will be covered under the AWS Free Tier. For example, DynamoDB offers up to 25GB of storage free. Lambda allows 1 million requests per month under their free tier. If you are worried about exceeding these limits, you can use the AWS Pricing Calculator to estimate costs.