Configuring your BACE Application
In this section, we walk through the required components that must be specified to run your own experiment. For a more detailed explanation of the mathematical foundations of BACE and examples, see Drake, Payró, Thakral, and Tô (2024).
To configure BACE in order to run your own adaptive choice experiment, update the configuration file in /app/bace/user_config.py
.
Model Components
BACE is characterized by four main components:
answers
- The set of discrete answers that can be observed.theta_params
- A dictionary specifying the preference parameters that you want to estimate. Each key in the dictionary is the name of the parameter, and each value specifies the prior distribution over that preference parameter. Prior distributions should bescipy.stats
distributions with.rvs()
and.logpdf()
methods. See https://docs.scipy.org/doc/scipy/reference/stats.html for a list of distributions.design_params
- A dictionary specifying the design space. Each key represents the name of a design parameters. Each value specifies a continuous distribution or discrete set of values that the parameter can take on. See https://github.com/ARM-software/mango#DomainSpace for details on specifying designs.likelihood_pdf
- A function that takes as input ananswer
,thetas
, and each key indesign_params
and returns the likelihood of observinganswer
to a design given the preference parametersthetas
.
These four components are required and will characterize your BACE application.
Specify Model Components
Specify Designs
Researchers should start by specifying the design space, which characterizes the range of values that each design parameter can take on during an experiment.
We use Mango for performing Bayesian Optimization to select optimal designs. See their documentation for more details on how to specify designs.
Specify the design_params
dictionary:
- Each key should be the parameter name.
- Each value should be the parameter's distribution.
- Distributions - Specify a distribution with an
.rvs
method. Allscipy.stats
distributions are supported. - Categorical - Specify an array of discrete values.
design_params = dict(
# continuous_param = scipy.stats.norm(0, 1),
# categorical_param = ['a', 'b', 'c']
)
Specify Thetas and Prior Distributions
Next, the researcher will specify a prior distribution over each preference parameter (including a choice inconsistency parameter) that they want to estimate. This object will be used to sample a dataframe thetas
that is used to estimate the posterior distribution.
The user should specify theta_params
as a dictionary.
- Each key should be a preference parameter name.
- Each value should be the prior distribution over that preference parameter.
- Most
scipy.stats
distributions are supported. In general, the distribution should have a.rvs
method for sampling points and.logpdf
method.
In general, prior distributions can be informed from previous results from the literature or pilot experiments.
Note that posterior estimates can only take on values that fall under the support of the prior distribution. For example, if the parameter x
is distributed according to a uniform distribution over [0, 1], then posterior estimates can only take on values between 0 and 1. In general, researchers should be careful to select informed priors whose support covers the range of values that researchers want to estimate.
An example where a researcher is estimating two parameters is shown below. param_1
is distributed according to a standard normal distribution. param_2
is distributed according to a uniform distribution over 0 to 10.
import scipy.stats
theta_params = dict(
param_1 = scipy.stats.norm(),
param_2 = scipy.stats.uniform(loc=0, scale=10)
)
Specify Answer Space
answers
is an array that represents the possible answers you can observe to each question. For a binary discrete choice experiment, this is simply [0, 1]. answers
must be a discrete set of values, but it can take on two or more values
or
Specify Likelihood Function
Finally, specify the pdf of the likelihood function, likelihood_pdf
. This should return the likelihood of choosing each answer
given a specific set of preference parameters, thetas
, and a given design
.
It takes as input:
answer
- Must be an element inanswers
.thetas
- DataFrame of preference parameters. This will correspond to the DataFrame generated fromtheta_params
above. Each parameter has a separate column and can be accessed by name. For example, to accessparam_1
, you can selectthetas[param_1]
.design
- DataFrame of design parameters. This will correspond to the DataFrame generated fromdesign_params
above. Each parameter has a separate column and can be accessed by name. For example, to accesscontinuous_param
, you can selectdesign[continuous_param]
.profile
- dictionary of user profile components. This will correspond to information stored about the user. For example, to accessincome
, you can selectprofile[income]
. This input is defaulted to None when not used.
This function then returns the probability of selecting each answer
given preference parameters thetas
for a given design.
We provide an example configuration below for a binary discrete choice experiment estimating one parameter.
answers = [0, 1]
design_params = dict(diff_price_b_vs_a = scipy.stats.uniform(0, 10), b_instead_of_a = [0, 1])
theta_params = dict(wtp_b_over_a = scipy.stats.uniform(0, 10), p = scipy.stats.uniform(0.5, 0.5))
def likelihood_pdf(answer, thetas, design, profile=None):
"""
Returns P(answer | thetas, design, profile)
For an individual with preference parameters `thetas`, this function outputs the likelihood of observing every possible `answer` for any `design` in `designs`.
Input:
`answer`: Must be an element in `answers`.
`thetas`: DataFrame of preference parameters. This will correspond to the DataFrame generated from `theta_parameters` above. Each parameter has a separate column and can be accessed by name. For example, to access parameter `x`, you can select `thetas['x']`.
`design`: Single `design`; this corresponds to a single row from `designs`. Each design parameter has a separate column and can be accessed by name. For example, to access the `alpha` design parameter, you can type `design['alpha']`.
`profile`: A dictionary with a user's information that is passed through the API call `create_profile` or added through `add_to_profile()`. When not used, this is set to None.
Returns:
likelihood (float): Likelihood of observing `answer`.
# Notes: Take care to guarantee likelihood ε (0, 1) for all values of answers/thetas/designs.
"""
base_utility_difference = - design['diff_price_b_vs_a'] + thetas['wtp_b_over_a'] * design['b_instead_of_a']
# Return preferred option with probability p. Choose randomly otherwise.
likelihood = (base_utility_difference > 0) * thetas['p'] + (1/2) * ( 1 - thetas['p'] )
# Take care to guarantee likelihood to be strictly between 0 and 1 for all values of answers/thetas/designs
eps = 1e-10
likelihood[likelihood < eps] = eps
likelihood[likelihood > (1 - eps)] = 1 - eps
return likelihood if str(answer) == '1' else (1 - likelihood)
Together, these components characterize a BACE application.
Other BACE Configurations
The optimization process is complex and can be time-consuming. The BACE package handles this for you by implementing Monte Carlo techniques. More details can be found in Drake, Payró, Thakral, and Tô (2024).
Other than the components above, other values should be specified at the top of /app/bace/user_config.py
to be used to tune the speed and precision of the algorithm.
In particular, users can set:
author
: Your name.size_thetas
: Specifies the number of points sampled from the prior distribution.conf_dict
: Specifies Bayesian Optimization hyperparameters. (See https://github.com/ARM-software/mango#6-optional-configurations)- We recommend doing simulations to understand the tradeoff between speed and precision when changing
domain_size
andnum_iteration
for your specific BACE application.
- We recommend doing simulations to understand the tradeoff between speed and precision when changing
max_opt_time
: Specifies the maximum time in seconds allowed for the Bayesian optimization process. If the algorithm takes longer thanmax_opt_time
seconds, then the program returns thedesign
that offers the highest gain in mutual information so far
Save File and Push Changes
After making the changes above, your application will be set up for your own experiment.
To push these changes to the cloud, run sam build
and sam deploy
.
To further customize the output format for integration on different survey platforms and add more details to a respondent's profile to be stored in the data, please refer to the next section.