Classifier¶
- See IDL definition for detailed specification.
- See Algorithms for detailed description of algorithms used in this server.
Configuration¶
Configuration is given as a JSON file. We show each field below:
-
method
Specify classificaiton algorithm. You can use these algorithms.
Value Method Classifier type "perceptron"
Use perceptron. linear classifier "PA"
Use Passive Aggressive (PA). [Crammer06] linear classifier "PA1"
Use PA-I. [Crammer06] linear classifier "PA2"
Use PA-II. [Crammer06] linear classifier "CW"
Use Confidence Weighted Learning. [Dredze08] linear classifier "AROW"
Use Adaptive Regularization of Weight vectors. [Crammer09b] linear classifier "NHERD"
Use Normal Herd. [Crammer10] linear classifier "NN"
Use an inplementation of nearest_neighbor
k-Nearest Neighbor "cosine"
Use the result of nearest neighbor search by cosine similarity [1] k-Nearest Neighbor "euclidean"
Use the result of nearest neighbor search by euclidean distance [1] k-Nearest Neighbor [1] (1, 2) These algorithms don’t support delete_label
API andunlearner
option
-
parameter
Specify parameters for the algorithm. Its format differs for each
method
. Note that adequate value forrefularization_weight
differ for each algorithm.Specify parameters for the algorithm. Its format differs for each
method
.- common
unlearner: Specify unlearner strategy. If you don’t use unlearner function, you can omit this parameter. You can specify unlearner
strategy described in Unlearner. Labels will be deleted based on strategy specified here. Whenmethod
is"NN"
, each data (labeled_datum
, not labels) will be deleted.unlearner_parameter: Specify unlearner parameter. You can specify unlearner_parameter
Unlearner. You cannot omit this parameter if you specifyunlearner
. Labels (or data) in excess of this number will be deleted automatically.note:
unlearner
andunlearner_parameter
can be omitted .- perceptron
- None
- PA
- None
- PA1
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- PA2
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer06]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- CW
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(\phi\) in the original paper [Dredze08]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- AROW
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(1/r\) in the original paper [Crammer09b]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- NHERD
regularization_weight: Sensitivity to learning rate. The bigger it is, the ealier you can train, but more sensitive to noise. It corresponds to \(C\) in the original paper [Crammer10]. (Float)
- Range: 0.0 <
regularization_weight
- Range: 0.0 <
- NN
method: Specify algorithm for nearest neighbor. Refer to Nearest Neighbor for the list of algorithms available.
parameter: Specify parameters for the algorithm. Refer to Nearest Neighbor for the list of parameters.
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
local_sensitivity: Sensitivity used for caliculating scores. When it is bigger, near data are weighted much more. When it is 0, all data will be treated as same weight. (Float)
- Range: 0.0 <=
local_sensitivity
- Range: 1 <=
- cosine
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
local_sensitivity: Sensitivity used for caliculating scores. When it is bigger, near data are weighted much more. When it is 0, all data will be treated as same weight. (Float)
- Range: 0.0 <=
local_sensitivity
- Range: 1 <=
- euclidean
nearest_neighbor_num: Number of data which is used for calculating scores. (Integer)
- Range: 1 <=
nearest_neighbor_num
local_sensitivity: Sensitivity used for caliculating scores. When it is bigger, near data are weighted much more. When it is 0, all data will be treated as same weight. (Float)
- Range: 0.0 <=
local_sensitivity
- Range: 1 <=
-
converter
Specify configuration for data conversion. Its format is described in Data Conversion.
- Example:
{ "method" : "AROW", "parameter" : { "regularization_weight" : 1.0 }, "converter" : { "string_filter_types" : {}, "string_filter_rules" : [], "num_filter_types" : {}, "num_filter_rules" : [], "string_types" : {}, "string_rules" : [ { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" } ], "num_types" : {}, "num_rules" : [ { "key" : "*", "type" : "num" } ] } }
Data Structures¶
Methods¶
-
service
classifier
-
int
train
(0: list<labeled_datum> data)¶ Parameters: - data – list of tuple of label and
datum
Returns: Number of trained datum (i.e., the length of the
data
)Trains and updates the model.
labeled_datum
is a tuple ofdatum
and its label. This API is designed to accept bulk update with list oflabeled_datum
.- data – list of tuple of label and
-
list<list<estimate_result>>
classify
(0: list<datum> data)¶ Parameters: - data – list of datum to classify
Returns: List of list of
estimate_result
, in order of givendatum
Estimates labels from given
data
. This API is designed to accept bulk classification with list ofdatum
.
-
map<string, ulong>
get_labels
()¶ Returns: Pairs of label and the number of trained data Returns the number of trained data for each label. If method is
NN
, the number of trained data that are deleted byunlearner
is not include in this count.
-
bool
set_label
(0: string new_label)¶ Parameters: - new_label – name of new label
Returns: True if the new label was not exist. False if the label already exists.
Append new label. If the label is already exist, it fails. New label is add when label found in
train
method argument, too.
-
bool
delete_label
(0: string target_label)¶ Parameters: - target_label – deleting label name
Returns: True if jubatus success to delete label. False if the label is not exists.
Deleting label. True if jubatus success to delete. False if the label is not exists.
-
int