Bandit¶
- See IDL definition for detailed specification.
Configuration¶
Configuration is given as a JSON file. We show each field below:
-
method
Specify bandit algorithm. You can use the algorithms below.
Value Method "epsilon_greedy"
Use epsilon-greedy. "epsilon_decreasing"`
Use Greedy Mix. "ucb1"
Use UCB1 "softmax"
Use softmax "exp3"
Use exp3 "ts"
Use Thompson sampling [1] [1] Note that reward
in register_reward API must be 0 or 1 when you use Thompson sampling method.
-
parameter
Specify parameters for the algorithm. Its format differs for each
method
.- common
assume_unrewarded: Specify whether it can be omitted to call register_reward
when the reward is zero. When it is True, callingregister_reward
can be omitted, but callingregister_reward
must be associated with the result ofselect_arm
. When it is False, althoughregister_reward
must be called when the reward is zero, it can be called independently of callingselect_arm
. (Boolean)- epsilon_greedy
epsilon: The probability of choosing arms randomly. With probability
epsilon
, choose an arm according to uniform distribution. With probability 1 -epsilon
, choose the arm whose expectation value is the highest. (Float)- Range: 0.0 <=
epsilon
<= 1.0
seed(optional): Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.
- range of value: 0 <=
seed
<= \(2^{32} - 1\)
- Range: 0.0 <=
- epsilon_decreasing
decreasing_rate: Decreasing rate for the probability of selecting arms randomly. The bigger this parameter is, the more slowly the probability decreases. (Float)
- Range: 0 <
decreasing_rate
< 1
seed(optional): Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.
- range of value: 0 <=
seed
<= \(2^{32} - 1\)
- Range: 0 <
- ucb1
- None
- softmax
tau: Temperature parameter. For high temperature, all arms are selected equally. For low temperature, arms with higher expected value are frequently selected. (Float)
- Range: 0.0 <
tau
seed(optional): Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.
- range of value: 0 <=
seed
<= \(2^{32} - 1\)
- Range: 0.0 <
- exp3
gamma: Mixture rate of constant weight and each arm’s weight. The higher
gamma
is, the higher the rate of constant weight is. The lowergamma
is, the higher the rate of each arm’s weight is. (Float)- Range: 0.0 <
gamma
<= 1.0
seed(optional): Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.
- range of value: 0 <=
seed
<= \(2^{32} - 1\)
- Range: 0.0 <
- ts
seed(optional): Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.
- range of value: 0 <=
seed
<= \(2^{32} - 1\)
- range of value: 0 <=
- Example:
{ "method" : "epsilon_greedy", "parameter" : { "assume_unrewarded" : false, "epsilon" : 0.1 } }
Data Structures¶
Methods¶
-
service
bandit
-
bool
register_arm
(0: string arm_id)¶ Parameters: - arm_id – ID of the new arm to be registered
Returns: True if succeeded in registering the arm. False if failed to register the arm.
Register a new arm with the name of
arm_id
.
-
bool
delete_arm
(0: string arm_id)¶ Parameters: - arm_id – ID of the arm to be deleted
Returns: True if succeeded in deleting the arm. False if failed to delete the arm.
Delete an arm with the name of
arm_id
.
-
string
select_arm
(0: string player_id)¶ Parameters: - player_id – ID of the player whose arm is to be selected
Returns: arm_id
selected by bandit algorithm.Select player’s arm according to current state.
-
bool
register_reward
(0: string player_id, 1: string arm_id, 2: double reward)¶ Parameters: - player_id – ID of the player whose arm gets rewards
- arm_id – ID of the arm which rewards are registered with
- reward – amount of rewards
Returns: True if succeeded in registering reward. False if failed to register rewards.
Register rewards with specified player’s specified arm.
-
map<string, arm_info>
get_arm_info
(0: string player_id)¶ Parameters: - player_id – ID of the player
Returns: arm information of specified player
Get all arms information of specified player.
-
bool
reset
(0: string player_id)¶ Parameters: - player_id – ID of the user whose arms are to be reset.
Returns: True if succeeded in resetting the arm. False if failed to reset.
Reset all arms information of specified player.
-
bool