Regression

  • See IDL definition for detailed specification.
  • See Algorithms for detailed description of algorithms used in this server.

Configuration

Configuration is given as a JSON file. We show each field below:

method

Specify regression algorithm. You can use these algorithms.

Value Method regression type
"perceptron" Use perceptron. linear regression
"PA" Use Passive Aggressive (PA). [Crammer06] linear regression
"PA1" Use PA-I. [Crammer06] linear regression
"PA2" Use PA-II. [Crammer06] linear regression
"CW" Use Confidence Weighted Learning. [Dredze08] linear regression
"AROW" Use Adaptive Regularization of Weight vectors. [Crammer09b] linear regression
"NHERD" Use Normal Herd. [Crammer10] linear regression
"NN" Use an inplementation of nearest_neighbor k-Nearest Neighbor
"cosine" Use the result of nearest neighbor search by cosine similarity k-Nearest Neighbor
"euclidean" Use the result of nearest neighbor search by euclidean distance k-Nearest Neighbor
parameter

Specify parameters for the algorithm. Its format differs for each method.

perceptron
learning_rate:

The ratio of error value and step width for weight update. The bigger it is, the ealier you can train, but more sensitive to noise. (Float)

  • Range: 0.0 < learning_rate
PA
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
PA1
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)

  • Range: 0.0 < regularization_weight
PA2
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)

  • Range: 0.0 < regularization_weight
CW
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(\phi\) in the original paper [Dredze08]. (Float)

  • Range: 0.0 < regularization_weight
AROW
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(1/r\) in the original paper [Crammer09b]. (Float)

  • Range: 0.0 < regularization_weight
NHERD
sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

  • Range: 0.0 <= sensitivity
regularization_weight:
 

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer10]. (Float)

  • Range: 0.0 < regularization_weight
NN
method:

Specify algorithm for nearest neighbor. Refer to Nearest Neighbor for the list of algorithms available.

parameter:

Specify parameters for the algorithm. Refer to Nearest Neighbor for the list of parameters.

nearest_neighbor_num:
 

Number of data which is used for calculating scores. (Integer)

  • Range: 1 <= nearest_neighbor_num
weight:

Specify method to weight each neighbor points. You can use these methods.

Value Method
"distance" weights neighbors by their distance or similarity. Closer neighbors will have greater influence than those are far away from the query.
"uniform" weights neighbors equally.

Note: weight option can be omitted (It works with "uniform")

cosine
nearest_neighbor_num:
 

Number of data which is used for calculating scores. (Integer)

  • Range: 1 <= nearest_neighbor_num
weight:

Specify method to weight each neighbor points. You can use these methods.

Value Method
"distance" weights neighbors by their similarity. Closer neighbors will have greater influence than those are far away from the query.
"uniform" weights neighbors equally.

Note: weight option can be omitted (It works with "uniform")

euclidean
nearest_neighbor_num:
 

Number of data which is used for calculating scores. (Integer)

  • Range: 1 <= nearest_neighbor_num
weight:

Specify method to weight each neighbor points. You can use these methods.

Value Method
"distance" weights neighbors by their distance. Closer neighbors will have greater influence than those are far away from the query.
"uniform" weights neighbors equally.

Note: weight option can be omitted (It works with "uniform")

converter

Specify configuration for data conversion. Its format is described in Data Conversion.

Example:
{
  "method": "PA1",
  "parameter" : {
    "sensitivity" : 0.1,
    "regularization_weight" : 3.402823e+38
  },
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types": {},
    "string_rules": [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  }
}

Data Structures

message scored_datum

Represents a datum with its label.

0: double score

Represents a label of this datum.

1: datum data

Represents a datum.

message scored_datum {
  0: double score
  1: datum data
}

Methods

service regression
int train(0: list<scored_datum> train_data)
Parameters:
  • train_data – list of tuple of label and datum
Returns:

Number of trained datum (i.e., the length of the train_data)

Trains and updates the model. This function is designed to allow bulk update with list of scored_datum.

list<double> estimate(0: list<datum> estimate_data)
Parameters:
  • estimate_data – list of datum to estimate
Reutrn:

List of estimated values, in order of given datum

Estimates the value from given estimate_data. This API is designed to allow bulk estimation with list of datum.