Anomaly¶
- See IDL definition for detailed specification.
Configuration¶
Configuration is given as a JSON file. We show each field below:
-
method
Specify algorithm for anomaly detection. You can use these algorithms.
Value Method "lof"
Use Local Outlier Factor based on recommender. [Breunig2000] "light_lof"
Use a variant of LOF based on nearest neighbor.
-
parameter
Specify parameters for the algorithm. Its format differs for each
method
.- common
unlearner: Specify unlearner strategy. If you don’t use unlearner, you should omit this parameter. You can specify unlearner
strategy described in Unlearner. Data will be deleted by the ID based on strategy specified here.unlearner_parameter: Specify unlearner parameter. You can specify unlearner_parameter
Unlearner. You cannot omit this parameter when you specifyunlearner
. Data in excess of this number will be deleted automatically.note:
unlearner
andunlearner_parameter
can be omitted .- lof
nearest_neighbor_num: Number of neighbors The bigger it is, the less false-positives are found, but the more false-negatives are found. (Integer)
- Range: 2 <=
nearest_neighbor_num
reverse_nearest_neighbor_num: Number of reverse neighbors to update, when annomaly measure values are update. The bigger it is, the more accurately measures are updated, but the longer update-time is required. (Integer)
- Range:
nearest_neighbor_num
<=reverse_nearest_neighbor_num
ignore_kth_same_point: Avoid scores to go
inf
by limiting the number of duplicate records tonearest_neighbor_num - 1
. This parameter is optional and isfalse
(disabled) by default. (Boolean)method: Algorithm name of recommender for nearest neighbor search. Refer
method
in Recommender.parameter: Parameters of the recommender for nearest neighbor search. Refer
parameter
in Recommender.- Range: 2 <=
- light_lof
nearest_neighbor_num: Number of neighbors The bigger it is, the less false-positives are found, but the more false-negatives are found. (Integer)
- Range: 2 <=
nearest_neighbor_num
reverse_nearest_neighbor_num: Number of reverse neighbors to update, when annomaly measure values are update. The bigger it is, the more accurately measures are updated, but the longer update-time is required. (Integer)
- Range:
nearest_neighbor_num
<=reverse_nearest_neighbor_num
ignore_kth_same_point: Avoid scores to go
inf
by limiting the number of duplicate records tonearest_neighbor_num - 1
. This parameter is optional and isfalse
(disabled) by default. (Boolean)method: Algorithm name of nearest neighbor for nearest neighbor search. Refer
method
in Nearest Neighbor.parameter: Parameters of the nearest neighbor for nearest neighbor search. Refer
parameter
in Nearest Neighbor.- Range: 2 <=
-
converter
Specify configuration for data conversion. Its format is described in Data Conversion.
- Example:
{ "method" : "lof", "parameter" : { "nearest_neighbor_num" : 10, "reverse_nearest_neighbor_num" : 30, "method" : "euclid_lsh", "parameter" : { "hash_num" : 64, "table_num" : 4, "seed" : 1091, "probe_num" : 64, "bin_width" : 100 } }, "converter" : { "string_filter_types" : {}, "string_filter_rules" : [], "num_filter_types" : {}, "num_filter_rules" : [], "string_types" : {}, "string_rules" : [ { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" } ], "num_types" : {}, "num_rules" : [ { "key" : "*", "type" : "num" } ] } }
Data Structures¶
Methods¶
-
service
anomaly
-
bool
clear_row
(0: string id)¶ Parameters: - id – point ID to be removed
Returns: True when the point was cleared successfully
Clears a point data with ID
id
.
-
id_with_score
add
(0: datum row)¶ Parameters: - row –
datum
for the point
Returns: Tuple of the point ID and the anomaly measure value
Adds a point data
row
.- row –
-
list<string>
add_bulk
(0: list<datum> data)¶ Parameters: - data – List of
datum
for the points
Returns: The list of successfully added IDs.
Adds a bulk of points. In contrast to
add
, this API doesn’t return anomaly measure values.- data – List of
-
double
update
(0: string id, 1: datum row)¶ Parameters: - id – point ID to update
- row – new
datum
for the point
Returns: Anomaly measure value
Updates the point
id
with the datarow
.
-
double
overwrite
(0: string id, 1: datum row)¶ Parameters: - id – point ID to overwrite
- row – new
datum
for the point
Returns: Anomaly measure value
Overwrites the point
id
with the datarow
.
-
double
calc_score
(0: datum row)¶ Parameters: - row –
datum
Returns: Anomaly measure value for given
row
Calculates an anomaly measure value for the point data
row
without adding a point.At this time, extremely large numbers can be returned. For the detail, please refer to FAQs:anomaly detection .
- row –
-
list<string>
get_all_rows
()¶ Returns: List of all point IDs Returns the list of all point IDs.
-
bool