Anomaly¶

See IDL definition for detailed specification.

Configuration¶

Configuration is given as a JSON file. We show each field below:

method

Specify algorithm for anomaly detection. You can use these algorithms.

Value	Method
`"lof"`	Use Local Outlier Factor based on recommender. [Breunig2000]
`"light_lof"`	Use a variant of LOF based on nearest neighbor.

parameter

Specify parameters for the algorithm. Its format differs for each method.

common

unlearner_parameter:
unlearner:	Specify unlearner strategy. If you don’t use unlearner, you should omit this parameter. You can specify `unlearner` strategy described in Unlearner. Data will be deleted by the ID based on strategy specified here.
	Specify unlearner parameter. You can specify `unlearner_parameter` Unlearner. You cannot omit this parameter when you specify `unlearner`. Data in excess of this number will be deleted automatically.

note: unlearner and unlearner_parameter can be omitted .

lof

nearest_neighbor_num:
	Number of neighbors The bigger it is, the less false-positives are found, but the more false-negatives are found. (Integer) Range: 2 <= `nearest_neighbor_num`
reverse_nearest_neighbor_num:
	Number of reverse neighbors to update, when annomaly measure values are update. The bigger it is, the more accurately measures are updated, but the longer update-time is required. (Integer) Range: `nearest_neighbor_num` <= `reverse_nearest_neighbor_num`
ignore_kth_same_point:
	Avoid scores to go `inf` by limiting the number of duplicate records to `nearest_neighbor_num - 1`. This parameter is optional and is `false` (disabled) by default. (Boolean)
method:	Algorithm name of recommender for nearest neighbor search. Refer `method` in Recommender.
parameter:	Parameters of the recommender for nearest neighbor search. Refer `parameter` in Recommender.

light_lof

nearest_neighbor_num:
	Number of neighbors The bigger it is, the less false-positives are found, but the more false-negatives are found. (Integer) Range: 2 <= `nearest_neighbor_num`
reverse_nearest_neighbor_num:
	Number of reverse neighbors to update, when annomaly measure values are update. The bigger it is, the more accurately measures are updated, but the longer update-time is required. (Integer) Range: `nearest_neighbor_num` <= `reverse_nearest_neighbor_num`
ignore_kth_same_point:
	Avoid scores to go `inf` by limiting the number of duplicate records to `nearest_neighbor_num - 1`. This parameter is optional and is `false` (disabled) by default. (Boolean)
method:	Algorithm name of nearest neighbor for nearest neighbor search. Refer `method` in Nearest Neighbor.
parameter:	Parameters of the nearest neighbor for nearest neighbor search. Refer `parameter` in Nearest Neighbor.

converter: Specify configuration for data conversion. Its format is described in Data Conversion.

Example:

{
  "method" : "lof",
  "parameter" : {
    "nearest_neighbor_num" : 10,
    "reverse_nearest_neighbor_num" : 30,
    "method" : "euclid_lsh",
    "parameter" : {
      "hash_num" : 64,
      "table_num" : 4,
      "seed" : 1091,
      "probe_num" : 64,
      "bin_width" : 100
    }
  },
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types" : {},
    "string_rules" : [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  }
}

Data Structures¶

message id_with_score¶

Represents ID with its score.

0: string id¶: Data ID.

1: double score¶: Score for the data. Negative (normal) data are scored around 1.0. Higher score means higher abnormality.

message id_with_score {
  0: string id
  1: double score
}

Methods¶

service anomaly

bool clear_row(0: string id)¶

Parameters:	id – point ID to be removed
Returns:	True when the point was cleared successfully

Clears a point data with ID id.

id_with_score add(0: datum row)¶

Parameters:	row – `datum` for the point
Returns:	Tuple of the point ID and the anomaly measure value

Adds a point data row.

list<string> add_bulk(0: list<datum> data)¶

Parameters:	data – List of `datum` for the points
Returns:	The list of successfully added IDs.

Adds a bulk of points. In contrast to add, this API doesn’t return anomaly measure values.

double update(0: string id, 1: datum row)¶

Parameters:	id – point ID to update row – new `datum` for the point
Returns:	Anomaly measure value

Updates the point id with the data row.

double overwrite(0: string id, 1: datum row)¶

Parameters:	id – point ID to overwrite row – new `datum` for the point
Returns:	Anomaly measure value

Overwrites the point id with the data row.

double calc_score(0: datum row)¶

Parameters:	row – `datum`
Returns:	Anomaly measure value for given `row`

Calculates an anomaly measure value for the point data row without adding a point.

At this time, extremely large numbers can be returned. For the detail, please refer to FAQs:anomaly detection .

list<string> get_all_rows()¶

Returns:	List of all point IDs

Returns the list of all point IDs.