Bandit

Configuration

Configuration is given as a JSON file. We show each field below:

method

Specify bandit algorithm. You can use the algorithms below.

Value Method
"epsilon_greedy" Use epsilon-greedy.
"epsilon_decreasing"` Use Greedy Mix.
"ucb1" Use UCB1
"softmax" Use softmax
"exp3" Use exp3
"ts" Use Thompson sampling [1]
[1]Note that reward in register_reward API must be 0 or 1 when you use Thompson sampling method.
parameter

Specify parameters for the algorithm. Its format differs for each method.

common
assume_unrewarded:
 Specify whether it can be omitted to call register_reward when the reward is zero. When it is True, calling register_reward can be omitted, but calling register_reward must be associated with the result of select_arm. When it is False, although register_reward must be called when the reward is zero, it can be called independently of calling select_arm. (Boolean)
epsilon_greedy
epsilon:

The probability of choosing arms randomly. With probability epsilon, choose an arm according to uniform distribution. With probability 1 - epsilon, choose the arm whose expectation value is the highest. (Float)

  • Range: 0.0 <= epsilon <= 1.0
seed(optional):

Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.

  • range of value: 0 <= seed <= \(2^{32} - 1\)
epsilon_decreasing
decreasing_rate:
 

Decreasing rate for the probability of selecting arms randomly. The bigger this parameter is, the more slowly the probability decreases. (Float)

  • Range: 0 < decreasing_rate < 1
seed(optional):

Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.

  • range of value: 0 <= seed <= \(2^{32} - 1\)
ucb1
None
softmax
tau:

Temperature parameter. For high temperature, all arms are selected equally. For low temperature, arms with higher expected value are frequently selected. (Float)

  • Range: 0.0 < tau
seed(optional):

Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.

  • range of value: 0 <= seed <= \(2^{32} - 1\)
exp3
gamma:

Mixture rate of constant weight and each arm’s weight. The higher gamma is, the higher the rate of constant weight is. The lower gamma is, the higher the rate of each arm’s weight is. (Float)

  • Range: 0.0 < gamma <= 1.0
seed(optional):

Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.

  • range of value: 0 <= seed <= \(2^{32} - 1\)
ts
seed(optional):

Specify seed used to generate random number. If not specified, system clock is used as seed parameter. So you will get different result each experiment.

  • range of value: 0 <= seed <= \(2^{32} - 1\)
Example:
{
  "method" : "epsilon_greedy",
  "parameter" : {
    "assume_unrewarded" : false,
    "epsilon" : 0.1
  }
}

Data Structures

message arm_info

The state of an arm.

0: int trial_count

Number of times of an arm being selected.

1: double weight

The weight of an arm.

Methods

service bandit
bool register_arm(0: string arm_id)
Parameters:
  • arm_id – ID of the new arm to be registered
Returns:

True if succeeded in registering the arm. False if failed to register the arm.

Register a new arm with the name of arm_id.

bool delete_arm(0: string arm_id)
Parameters:
  • arm_id – ID of the arm to be deleted
Returns:

True if succeeded in deleting the arm. False if failed to delete the arm.

Delete an arm with the name of arm_id.

string select_arm(0: string player_id)
Parameters:
  • player_id – ID of the player whose arm is to be selected
Returns:

arm_id selected by bandit algorithm.

Select player’s arm according to current state.

bool register_reward(0: string player_id, 1: string arm_id, 2: double reward)
Parameters:
  • player_id – ID of the player whose arm gets rewards
  • arm_id – ID of the arm which rewards are registered with
  • reward – amount of rewards
Returns:

True if succeeded in registering reward. False if failed to register rewards.

Register rewards with specified player’s specified arm.

map<string, arm_info> get_arm_info(0: string player_id)
Parameters:
  • player_id – ID of the player
Returns:

arm information of specified player

Get all arms information of specified player.

bool reset(0: string player_id)
Parameters:
  • player_id – ID of the user whose arms are to be reset.
Returns:

True if succeeded in resetting the arm. False if failed to reset.

Reset all arms information of specified player.