Regression¶

See IDL definition for detailed specification.
See Algorithms for detailed description of algorithms used in this server.

Configuration¶

Configuration is given as a JSON file. We show each field below:

method

Specify regression algorithm. You can use these algorithms.

Value	Method	regression type
`"perceptron"`	Use perceptron.	linear regression
`"PA"`	Use Passive Aggressive (PA). [Crammer06]	linear regression
`"PA1"`	Use PA-I. [Crammer06]	linear regression
`"PA2"`	Use PA-II. [Crammer06]	linear regression
`"CW"`	Use Confidence Weighted Learning. [Dredze08]	linear regression
`"AROW"`	Use Adaptive Regularization of Weight vectors. [Crammer09b]	linear regression
`"NHERD"`	Use Normal Herd. [Crammer10]	linear regression
`"NN"`	Use an inplementation of `nearest_neighbor`	k-Nearest Neighbor
`"cosine"`	Use the result of nearest neighbor search by cosine similarity	k-Nearest Neighbor
`"euclidean"`	Use the result of nearest neighbor search by euclidean distance	k-Nearest Neighbor

parameter

Specify parameters for the algorithm. Its format differs for each method.

perceptron

learning_rate:

The ratio of error value and step width for weight update. The bigger it is, the ealier you can train, but more sensitive to noise. (Float)

Range: 0.0 < learning_rate

PA

sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

Range: 0.0 <= sensitivity

PA1

sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

Range: 0.0 <= sensitivity

regularization_weight:

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)

Range: 0.0 < regularization_weight

PA2

sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

Range: 0.0 <= sensitivity

regularization_weight:

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)

Range: 0.0 < regularization_weight

CW

sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

Range: 0.0 <= sensitivity

regularization_weight:

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(\phi\) in the original paper [Dredze08]. (Float)

Range: 0.0 < regularization_weight

AROW

sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

Range: 0.0 <= sensitivity

regularization_weight:

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(1/r\) in the original paper [Crammer09b]. (Float)

Range: 0.0 < regularization_weight

NHERD

sensitivity:

Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)

Range: 0.0 <= sensitivity

regularization_weight:

Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer10]. (Float)

Range: 0.0 < regularization_weight

NN

method:

Specify algorithm for nearest neighbor. Refer to Nearest Neighbor for the list of algorithms available.

parameter:

Specify parameters for the algorithm. Refer to Nearest Neighbor for the list of parameters.

nearest_neighbor_num:

Number of data which is used for calculating scores. (Integer)

Range: 1 <= nearest_neighbor_num

weight:

Specify method to weight each neighbor points. You can use these methods.

Value	Method
`"distance"`	weights neighbors by their distance or similarity. Closer neighbors will have greater influence than those are far away from the query.
`"uniform"`	weights neighbors equally.

Note: weight option can be omitted (It works with "uniform")

cosine

nearest_neighbor_num:

Number of data which is used for calculating scores. (Integer)

Range: 1 <= nearest_neighbor_num

weight:

Specify method to weight each neighbor points. You can use these methods.

Value	Method
`"distance"`	weights neighbors by their similarity. Closer neighbors will have greater influence than those are far away from the query.
`"uniform"`	weights neighbors equally.

Note: weight option can be omitted (It works with "uniform")

euclidean

nearest_neighbor_num:

Number of data which is used for calculating scores. (Integer)

Range: 1 <= nearest_neighbor_num

weight:

Specify method to weight each neighbor points. You can use these methods.

Value	Method
`"distance"`	weights neighbors by their distance. Closer neighbors will have greater influence than those are far away from the query.
`"uniform"`	weights neighbors equally.

Note: weight option can be omitted (It works with "uniform")

converter: Specify configuration for data conversion. Its format is described in Data Conversion.

Example:

{
  "method": "PA1",
  "parameter" : {
    "sensitivity" : 0.1,
    "regularization_weight" : 3.402823e+38
  },
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types": {},
    "string_rules": [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  }
}

Data Structures¶

message scored_datum¶

Represents a datum with its label.

0: double score¶: Represents a label of this datum.

1: datum data¶: Represents a datum.

message scored_datum {
  0: double score
  1: datum data
}

Methods¶

service regression

int train(0: list<scored_datum> train_data)¶

Parameters:	train_data – list of tuple of label and `datum`
Returns:	Number of trained datum (i.e., the length of the `train_data`)

Trains and updates the model. This function is designed to allow bulk update with list of scored_datum.

list<double> estimate(0: list<datum> estimate_data)¶

Parameters:	estimate_data – list of `datum` to estimate
Reutrn:	List of estimated values, in order of given `datum`

Estimates the value from given estimate_data. This API is designed to allow bulk estimation with list of datum.