Anomaly

Configuration

Configuration is given as a JSON file. We show each field below:

method

Specify algorithm for anomaly detection. You can use these algorithms.

Value Method
"lof" Use Local Outlier Factor based on recommender. [Breunig2000]
"light_lof" Use a variant of LOF based on nearest neighbor.
parameter

Specify parameters for the algorithm. Its format differs for each method.

common
unlearner:Specify unlearner strategy. If you don’t use unlearner, you should omit this parameter. You can specify unlearner strategy described in Unlearner. Data will be deleted by the ID based on strategy specified here.
unlearner_parameter:
 Specify unlearner parameter. You can specify unlearner_parameter Unlearner. You cannot omit this parameter when you specify unlearner. Data in excess of this number will be deleted automatically.

note: unlearner and unlearner_parameter can be omitted .

lof
nearest_neighbor_num:
 

Number of neighbors The bigger it is, the less false-positives are found, but the more false-negatives are found. (Integer)

  • Range: 2 <= nearest_neighbor_num
reverse_nearest_neighbor_num:
 

Number of reverse neighbors to update, when annomaly measure values are update. The bigger it is, the more accurately measures are updated, but the longer update-time is required. (Integer)

  • Range: nearest_neighbor_num <= reverse_nearest_neighbor_num
ignore_kth_same_point:
 

Avoid scores to go inf by limiting the number of duplicate records to nearest_neighbor_num - 1. This parameter is optional and is false (disabled) by default. (Boolean)

method:

Algorithm name of recommender for nearest neighbor search. Refer method in Recommender.

parameter:

Parameters of the recommender for nearest neighbor search. Refer parameter in Recommender.

light_lof
nearest_neighbor_num:
 

Number of neighbors The bigger it is, the less false-positives are found, but the more false-negatives are found. (Integer)

  • Range: 2 <= nearest_neighbor_num
reverse_nearest_neighbor_num:
 

Number of reverse neighbors to update, when annomaly measure values are update. The bigger it is, the more accurately measures are updated, but the longer update-time is required. (Integer)

  • Range: nearest_neighbor_num <= reverse_nearest_neighbor_num
ignore_kth_same_point:
 

Avoid scores to go inf by limiting the number of duplicate records to nearest_neighbor_num - 1. This parameter is optional and is false (disabled) by default. (Boolean)

method:

Algorithm name of nearest neighbor for nearest neighbor search. Refer method in Nearest Neighbor.

parameter:

Parameters of the nearest neighbor for nearest neighbor search. Refer parameter in Nearest Neighbor.

converter

Specify configuration for data conversion. Its format is described in Data Conversion.

Example:
{
  "method" : "lof",
  "parameter" : {
    "nearest_neighbor_num" : 10,
    "reverse_nearest_neighbor_num" : 30,
    "method" : "euclid_lsh",
    "parameter" : {
      "hash_num" : 64,
      "table_num" : 4,
      "seed" : 1091,
      "probe_num" : 64,
      "bin_width" : 100
    }
  },
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types" : {},
    "string_rules" : [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  }
}

Data Structures

message id_with_score

Represents ID with its score.

0: string id

Data ID.

1: double score

Score for the data. Negative (normal) data are scored around 1.0. Higher score means higher abnormality.

message id_with_score {
  0: string id
  1: double score
}

Methods

service anomaly
bool clear_row(0: string id)
Parameters:
  • id – point ID to be removed
Returns:

True when the point was cleared successfully

Clears a point data with ID id.

id_with_score add(0: datum row)
Parameters:
  • rowdatum for the point
Returns:

Tuple of the point ID and the anomaly measure value

Adds a point data row.

list<string> add_bulk(0: list<datum> data)
Parameters:
  • data – List of datum for the points
Returns:

The list of successfully added IDs.

Adds a bulk of points. In contrast to add, this API doesn’t return anomaly measure values.

double update(0: string id, 1: datum row)
Parameters:
  • id – point ID to update
  • row – new datum for the point
Returns:

Anomaly measure value

Updates the point id with the data row.

double overwrite(0: string id, 1: datum row)
Parameters:
  • id – point ID to overwrite
  • row – new datum for the point
Returns:

Anomaly measure value

Overwrites the point id with the data row.

double calc_score(0: datum row)
Parameters:
Returns:

Anomaly measure value for given row

Calculates an anomaly measure value for the point data row without adding a point.

At this time, extremely large numbers can be returned. For the detail, please refer to FAQs:anomaly detection .

list<string> get_all_rows()
Returns:List of all point IDs

Returns the list of all point IDs.