Regression¶
- See IDL definition for detailed specification.
- See Algorithms for detailed description of algorithms used in this server.
Configuration¶
Configuration is given as a JSON file. We show each field below:
-
method
Specify regression algorithm. You can use these algorithms.
Value Method regression type "perceptron"
Use perceptron. linear regression "PA"
Use Passive Aggressive (PA). [Crammer06] linear regression "PA1"
Use PA-I. [Crammer06] linear regression "PA2"
Use PA-II. [Crammer06] linear regression "CW"
Use Confidence Weighted Learning. [Dredze08] linear regression "AROW"
Use Adaptive Regularization of Weight vectors. [Crammer09b] linear regression "NHERD"
Use Normal Herd. [Crammer10] linear regression "NN"
Use an inplementation of nearest_neighbor
k-Nearest Neighbor "cosine"
Use the result of nearest neighbor search by cosine similarity k-Nearest Neighbor "euclidean"
Use the result of nearest neighbor search by euclidean distance k-Nearest Neighbor
-
parameter
Specify parameters for the algorithm. Its format differs for each
method
.- perceptron
learning_rate: The ratio of error value and step width for weight update. The bigger it is, the ealier you can train, but more sensitive to noise. (Float)
- Range: 0.0 <
learning_rate
- Range: 0.0 <
- PA
sensitivity: Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)
- Range: 0.0 <=
sensitivity
- Range: 0.0 <=
- PA1
sensitivity: Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)
- Range: 0.0 <=
sensitivity
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <=
- PA2
sensitivity: Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)
- Range: 0.0 <=
sensitivity
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <=
- CW
sensitivity: Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)
- Range: 0.0 <=
sensitivity
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(\phi\) in the original paper [Dredze08]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <=
- AROW
sensitivity: Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)
- Range: 0.0 <=
sensitivity
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(1/r\) in the original paper [Crammer09b]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <=
- NHERD
sensitivity: Upper bound of acceptable margin. The bigger it is, more robust to noise, but the more error remain. (Float)
- Range: 0.0 <=
sensitivity
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer10]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <=
- NN
method: Specify algorithm for nearest neighbor. Refer to Nearest Neighbor for the list of algorithms available.
parameter: Specify parameters for the algorithm. Refer to Nearest Neighbor for the list of parameters.
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
weight: Specify method to weight each neighbor points. You can use these methods.
Value Method "distance"
weights neighbors by their distance or similarity. Closer neighbors will have greater influence than those are far away from the query. "uniform"
weights neighbors equally. Note:
weight
option can be omitted (It works with"uniform"
)- Range: 1 <=
- cosine
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
weight: Specify method to weight each neighbor points. You can use these methods.
Value Method "distance"
weights neighbors by their similarity. Closer neighbors will have greater influence than those are far away from the query. "uniform"
weights neighbors equally. Note:
weight
option can be omitted (It works with"uniform"
)- Range: 1 <=
- euclidean
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
weight: Specify method to weight each neighbor points. You can use these methods.
Value Method "distance"
weights neighbors by their distance. Closer neighbors will have greater influence than those are far away from the query. "uniform"
weights neighbors equally. Note:
weight
option can be omitted (It works with"uniform"
)- Range: 1 <=
-
converter
Specify configuration for data conversion. Its format is described in Data Conversion.
- Example:
{ "method": "PA1", "parameter" : { "sensitivity" : 0.1, "regularization_weight" : 3.402823e+38 }, "converter" : { "string_filter_types" : {}, "string_filter_rules" : [], "num_filter_types" : {}, "num_filter_rules" : [], "string_types": {}, "string_rules": [ { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" } ], "num_types" : {}, "num_rules" : [ { "key" : "*", "type" : "num" } ] } }
Data Structures¶
Methods¶
-
service
regression
-
int
train
(0: list<scored_datum> train_data)¶ Parameters: - train_data – list of tuple of label and
datum
Returns: Number of trained datum (i.e., the length of the
train_data
)Trains and updates the model. This function is designed to allow bulk update with list of
scored_datum
.- train_data – list of tuple of label and
-
int