Classifier¶
- See IDL definition for detailed specification.
- See Algorithms for detailed description of algorithms used in this server.
Configuration¶
Configuration is given as a JSON file. We show each field below:
-
method Specify classificaiton algorithm. You can use these algorithms.
Value Method Classifier type "perceptron"Use perceptron. linear classifier "PA"Use Passive Aggressive (PA). [Crammer06] linear classifier "PA1"Use PA-I. [Crammer06] linear classifier "PA2"Use PA-II. [Crammer06] linear classifier "CW"Use Confidence Weighted Learning. [Dredze08] linear classifier "AROW"Use Adaptive Regularization of Weight vectors. [Crammer09b] linear classifier "NHERD"Use Normal Herd. [Crammer10] linear classifier "NN"Use an inplementation of nearest_neighbork-Nearest Neighbor "cosine"Use the result of nearest neighbor search by cosine similarity [1] k-Nearest Neighbor "euclidean"Use the result of nearest neighbor search by euclidean distance [1] k-Nearest Neighbor [1] (1, 2) These algorithms don’t support delete_labelAPI andunlearneroption
-
parameter Specify parameters for the algorithm. Its format differs for each
method. Note that adequate value forrefularization_weightdiffer for each algorithm.Specify parameters for the algorithm. Its format differs for each
method.- common
unlearner: Specify unlearner strategy. If you don’t use unlearner function, you can omit this parameter. You can specify unlearnerstrategy described in Unlearner. Labels will be deleted based on strategy specified here. Whenmethodis"NN", each data (labeled_datum, not labels) will be deleted.unlearner_parameter: Specify unlearner parameter. You can specify unlearner_parameterUnlearner. You cannot omit this parameter if you specifyunlearner. Labels (or data) in excess of this number will be deleted automatically.note:
unlearnerandunlearner_parametercan be omitted .- perceptron
- None
- PA
- None
- PA1
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- PA2
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- CW
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(\phi\) in the original paper [Dredze08]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- AROW
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(1/r\) in the original paper [Crammer09b]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- NHERD
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer10]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- NN
method: Specify algorithm for nearest neighbor. Refer to Nearest Neighbor for the list of algorithms available.
parameter: Specify parameters for the algorithm. Refer to Nearest Neighbor for the list of parameters.
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
local_sensitivity: Sensitivity used for caliculating scores. When it is bigger, near data are weighted much more. When it is 0, all data will be treated as same weight. (Float)
- Range: 0.0 <=
local_sensitivity
- Range: 1 <=
- cosine
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
local_sensitivity: Sensitivity used for caliculating scores. When it is bigger, near data are weighted much more. When it is 0, all data will be treated as same weight. (Float)
- Range: 0.0 <=
local_sensitivity
- Range: 1 <=
- euclidean
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
local_sensitivity: Sensitivity used for caliculating scores. When it is bigger, near data are weighted much more. When it is 0, all data will be treated as same weight. (Float)
- Range: 0.0 <=
local_sensitivity
- Range: 1 <=
-
converter Specify configuration for data conversion. Its format is described in Data Conversion.
- Example:
{ "method" : "AROW", "parameter" : { "regularization_weight" : 1.0 }, "converter" : { "string_filter_types" : {}, "string_filter_rules" : [], "num_filter_types" : {}, "num_filter_rules" : [], "string_types" : {}, "string_rules" : [ { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" } ], "num_types" : {}, "num_rules" : [ { "key" : "*", "type" : "num" } ] } }
Data Structures¶
Methods¶
-
service
classifier -
int
train(0: list<labeled_datum> data)¶ Parameters: - data – list of tuple of label and
datum
Returns: Number of trained datum (i.e., the length of the
data)Trains and updates the model.
labeled_datumis a tuple ofdatumand its label. This API is designed to accept bulk update with list oflabeled_datum.- data – list of tuple of label and
-
list<list<estimate_result>>
classify(0: list<datum> data)¶ Parameters: - data – list of datum to classify
Returns: List of list of
estimate_result, in order of givendatumEstimates labels from given
data. This API is designed to accept bulk classification with list ofdatum.
-
map<string, ulong>
get_labels()¶ Returns: Pairs of label and the number of trained data Returns the number of trained data for each label. If method is
NN, the number of trained data that are deleted byunlearneris not include in this count.
-
bool
set_label(0: string new_label)¶ Parameters: - new_label – name of new label
Returns: True if the new label was not exist. False if the label already exists.
Append new label. If the label is already exist, it fails. New label is add when label found in
trainmethod argument, too.
-
bool
delete_label(0: string target_label)¶ Parameters: - target_label – deleting label name
Returns: True if jubatus success to delete label. False if the label is not exists.
Deleting label. True if jubatus success to delete. False if the label is not exists.
-
int