Regression

  • See IDL definition for detailed specification.
  • See Algorithms for detailed description of algorithms used in this server.

Configuration

Configuration is given as a JSON file. We show each field below:

method

Specify regression algorithm. You can use these algorithms.

Value Method regression type
"perceptron" Use perceptron. linear regression
"PA" Use Passive Aggressive (PA). [Crammer06] linear regression
"PA1" Use PA-I. [Crammer06] linear regression
"PA2" Use PA-II. [Crammer06] linear regression
"CW" Use Confidence Weighted Learning. [Dredze08] linear regression
"AROW" Use Adaptive Regularization of Weight vectors. [Crammer09b] linear regression
"NHERD" Use Normal Herd. [Crammer10] linear regression
"NN" Use an inplementation of nearest_neighbor k-Nearest Neighbor
"cosine" Use the result of nearest neighbor search by cosine similarity k-Nearest Neighbor
"euclidean" Use the result of nearest neighbor search by euclidean distance k-Nearest Neighbor
parameter

Specify parameters for the algorithm. Its format differs for each method.

perceptron
learning_rate:

The ratio of error value and step width for weight update. The bigger it is, the ealier you can train, but more sensitive to noise. (Float)

  • Range: 0.0 < learning_rate
PA
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
PA1
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)

  • Range: 0.0 < regularization_weight
PA2
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)

  • Range: 0.0 < regularization_weight
CW
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(\phi\) in the original paper [Dredze08]. (Float)

  • Range: 0.0 < regularization_weight
AROW
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(1/r\) in the original paper [Crammer09b]. (Float)

  • Range: 0.0 < regularization_weight
NHERD
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer10]. (Float)

  • Range: 0.0 < regularization_weight
NN
method:

Specify algorithm for nearest neighbor. Refer to Nearest Neighbor for the list of algorithms available.

parameter:

Specify parameters for the algorithm. Refer to Nearest Neighbor for the list of parameters.

nearest_neighbor_num:
 

Number of data which is used for calculating scores. (Integer)

  • Range: 1 <= nearest_neighbor_num
cosine
nearest_neighbor_num:
 

Number of data which is used for calculating scores. (Integer)

  • Range: 1 <= nearest_neighbor_num
euclidean
nearest_neighbor_num:
 

Number of data which is used for calculating scores. (Integer)

  • Range: 1 <= nearest_neighbor_num
converter

Specify configuration for data conversion. Its format is described in Data Conversion.

Example:
{
  "method": "PA1",
  "parameter" : {
    "sensitivity" : 0.1,
    "regularization_weight" : 3.402823e+38
  },
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types": {},
    "string_rules": [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  }
}

Data Structures

message scored_datum

Represents a datum with its label.

0: float score

Represents a label of this datum.

1: datum data

Represents a datum.

message scored_datum {
  0: float score
  1: datum data
}

Methods

service regression
int train(0: list<scored_datum> train_data)
Parameters:
  • train_data – list of tuple of label and datum
Returns:

Number of trained datum (i.e., the length of the train_data)

Trains and updates the model. This function is designed to allow bulk update with list of scored_datum.

list<float> estimate(0: list<datum> estimate_data)
Parameters:
  • estimate_data – list of datum to estimate
Reutrn:

List of estimated values, in order of given datum

Estimates the value from given estimate_data. This API is designed to allow bulk estimation with list of datum.