Configuring your BACE Application

In this section, we walk through the required components that must be specified to run your own experiment. For a more detailed explanation of the mathematical foundations of BACE and examples, see Drake, Payró, Thakral, and Tô (2024).

To configure BACE in order to run your own adaptive choice experiment, update the configuration file in /app/bace/user_config.py.

Model Components

BACE is characterized by four main components:

answers - The set of discrete answers that can be observed.
theta_params - A dictionary specifying the preference parameters that you want to estimate. Each key in the dictionary is the name of the parameter, and each value specifies the prior distribution over that preference parameter. Prior distributions should be scipy.stats distributions with .rvs() and .logpdf() methods. See https://docs.scipy.org/doc/scipy/reference/stats.html for a list of distributions.
design_params - A dictionary specifying the design space. Each key represents the name of a design parameters. Each value specifies a continuous distribution or discrete set of values that the parameter can take on. See https://github.com/ARM-software/mango#DomainSpace for details on specifying designs.
likelihood_pdf - A function that takes as input an answer, thetas, and each key in design_params and returns the likelihood of observing answer to a design given the preference parameters thetas.

These four components are required and will characterize your BACE application.

Specify Model Components

Specify Designs

Researchers should start by specifying the design space, which characterizes the range of values that each design parameter can take on during an experiment.

We use Mango for performing Bayesian Optimization to select optimal designs. See their documentation for more details on how to specify designs.

Specify the design_params dictionary:

Each key should be the parameter name.
Each value should be the parameter's distribution.
Distributions - Specify a distribution with an .rvs method. All scipy.stats distributions are supported.
Categorical - Specify an array of discrete values.

design_params = dict(
    # continuous_param = scipy.stats.norm(0, 1),
    # categorical_param = ['a', 'b', 'c']
)

Specify Thetas and Prior Distributions

Next, the researcher will specify a prior distribution over each preference parameter (including a choice inconsistency parameter) that they want to estimate. This object will be used to sample a dataframe thetas that is used to estimate the posterior distribution.

The user should specify theta_params as a dictionary.

Each key should be a preference parameter name.
Each value should be the prior distribution over that preference parameter.
Most scipy.stats distributions are supported. In general, the distribution should have a .rvs method for sampling points and .logpdf method.

In general, prior distributions can be informed from previous results from the literature or pilot experiments.

Note that posterior estimates can only take on values that fall under the support of the prior distribution. For example, if the parameter x is distributed according to a uniform distribution over [0, 1], then posterior estimates can only take on values between 0 and 1. In general, researchers should be careful to select informed priors whose support covers the range of values that researchers want to estimate.

An example where a researcher is estimating two parameters is shown below. param_1 is distributed according to a standard normal distribution. param_2 is distributed according to a uniform distribution over 0 to 10.

import scipy.stats

theta_params = dict(
    param_1 = scipy.stats.norm(),
    param_2 = scipy.stats.uniform(loc=0, scale=10)
)

Specify Answer Space

answers is an array that represents the possible answers you can observe to each question. For a binary discrete choice experiment, this is simply [0, 1]. answers must be a discrete set of values, but it can take on two or more values

answers = [0, 1] # Binary choice experiment

or

answers = ['low', 'medium', 'high'] # Multiple discrete options

Specify Likelihood Function

Finally, specify the pdf of the likelihood function, likelihood_pdf. This should return the likelihood of choosing each answer given a specific set of preference parameters, thetas, and a given design.

It takes as input:

answer - Must be an element in answers.
thetas - DataFrame of preference parameters. This will correspond to the DataFrame generated from theta_params above. Each parameter has a separate column and can be accessed by name. For example, to access param_1, you can select thetas[param_1].
design - DataFrame of design parameters. This will correspond to the DataFrame generated from design_params above. Each parameter has a separate column and can be accessed by name. For example, to access continuous_param, you can select design[continuous_param].
profile - dictionary of user profile components. This will correspond to information stored about the user. For example, to access income, you can select profile[income]. This input is defaulted to None when not used.

This function then returns the probability of selecting each answer given preference parameters thetas for a given design.

We provide an example configuration below for a binary discrete choice experiment estimating one parameter.

answers = [0, 1]

design_params = dict(diff_price_b_vs_a = scipy.stats.uniform(0, 10), b_instead_of_a = [0, 1])

theta_params = dict(wtp_b_over_a = scipy.stats.uniform(0, 10), p = scipy.stats.uniform(0.5, 0.5))

def likelihood_pdf(answer, thetas, design, profile=None):
    """
    Returns P(answer | thetas, design, profile)
    For an individual with preference parameters `thetas`, this function outputs the likelihood of observing every possible `answer` for any `design` in `designs`.

    Input:
        `answer`: Must be an element in `answers`.
        `thetas`: DataFrame of preference parameters. This will correspond to the DataFrame generated from `theta_parameters` above. Each parameter has a separate column and can be accessed by name. For example, to access parameter `x`, you can select `thetas['x']`.
        `design`: Single `design`; this corresponds to a single row from `designs`. Each design parameter has a separate column and can be accessed by name. For example, to access the `alpha` design parameter, you can type `design['alpha']`.
        `profile`: A dictionary with a user's information that is passed through the API call `create_profile` or added through `add_to_profile()`. When not used, this is set to None.
    Returns:
        likelihood (float): Likelihood of observing `answer`.

    # Notes: Take care to guarantee likelihood ε (0, 1) for all values of answers/thetas/designs.
    """

    base_utility_difference = - design['diff_price_b_vs_a'] + thetas['wtp_b_over_a'] * design['b_instead_of_a']

    # Return preferred option with probability p. Choose randomly otherwise.
    likelihood = (base_utility_difference > 0) * thetas['p'] + (1/2) * ( 1 - thetas['p'] )

    # Take care to guarantee likelihood to be strictly between 0 and 1 for all values of answers/thetas/designs
    eps = 1e-10
    likelihood[likelihood < eps] = eps
    likelihood[likelihood > (1 - eps)] = 1 - eps

    return likelihood if str(answer) == '1' else (1 - likelihood)

Together, these components characterize a BACE application.

Other BACE Configurations

The optimization process is complex and can be time-consuming. The BACE package handles this for you by implementing Monte Carlo techniques. More details can be found in Drake, Payró, Thakral, and Tô (2024).

Other than the components above, other values should be specified at the top of /app/bace/user_config.py to be used to tune the speed and precision of the algorithm.

In particular, users can set:

author: Your name.
size_thetas: Specifies the number of points sampled from the prior distribution.
conf_dict: Specifies Bayesian Optimization hyperparameters. (See https://github.com/ARM-software/mango#6-optional-configurations)
- We recommend doing simulations to understand the tradeoff between speed and precision when changing domain_size and num_iteration for your specific BACE application.
max_opt_time: Specifies the maximum time in seconds allowed for the Bayesian optimization process. If the algorithm takes longer than max_opt_time seconds, then the program returns the design that offers the highest gain in mutual information so far

Save File and Push Changes

After making the changes above, your application will be set up for your own experiment.

To push these changes to the cloud, run sam build and sam deploy.

To further customize the output format for integration on different survey platforms and add more details to a respondent's profile to be stored in the data, please refer to the next section.