Burst

Configuration

Configuration is given as a JSON file. We show each field below:

method

Specify burst detection algorithm. You can use these algorithms.

Value Method
"burst" Use Kleinberg burst detection.
parameter

Specify parameters for the algorithm. Its format differs for each method.

burst
window_batch_size:
 

Number of batches in a window. (Integer)

  • Range: 0 < window_batch_size
batch_interval:

Width of position for a batch. (Double)

  • Range: 0 < batch_interval
max_reuse_batch_num:
 

Number of batches reused. Larger value reduces the calculation cost. (Integer)

  • Range: 0 <= max_reuse_batch_num <= window_batch_size
costcut_threshold:
 

A threshold value for cost cut. Smaller value reduces the calculation cost. When 0 is specified, no cost cut will be performed (DBL_MAX). (Double)

  • Range: 0 < costcut_threshold
result_window_rotate_size:
 

Total number of windows to be held on the memory, including the current window). (Integer)

  • Range: 0 < result_window_rotate_size

Data Structures

message keyword_with_params

Represents the keyword and its parameters to be detected as burst.

0: string keyword

The keyword to be burst-detected.

1: double scaling_param

A scaling parameter applied for this keyword.

  • Range: 1 < scaling_param

This parameter need to be set by the number of total documents and the number of relevant documents that contains the keyword in each window (call total_documents and relevant_documents.)

In jubaburst, to notify the inappropriate parameter setting, if

scaling_param < total_documents / relevant_documents

at each batch, burst_weight will return INF. ( For more detail, see the original paper [Kleinburg02])

2: double gamma

A γ value applied for this keyword. The higher value reduces the burst detection sensitivity.

  • Range: 0 < gamma
message keyword_with_params {
  0: string keyword
  1: double scaling_param
  2: double gamma
}
message batch

Represents the burst detection result for one batch range.

0: int all_data_count

Number of total documents in this batch.

  • Range: 0 < all_data_count
1: int relevant_data_count

Number of documents that contains the keyword in this batch.

  • Range: 0 < all_data_count <= relevant_data_count
2: double burst_weight

Burst level of this batch. Burst level is a relative value that cannot be compared between keywords.

  • Range: 0 <= burst_weight
message batch {
  0: int all_data_count
  1: int relevant_data_count
  2: double burst_weight
}
message window

Represents the burst detection result.

0: double start_pos

Starting position of this window.

1: list<batch> batches

Batches that composes this window.

message window {
  0: double start_pos
  1: list<batch> batches
}
message document

Represents the document used for burst detection.

0: double pos

Position (time in many cases) of this document.

1: string text

Contents of this document. Keyword matching runs against this data using partial match.

message document {
  0: double pos
  1: string text
}

Methods

service burst
int add_documents(0: list<document> data)
Parameters:
  • data – list of documents to be added
Returns:

number of documents successfully registered (will be the length of data if all documents are registered successfully)

Register the document for burst detection. This This API is designed to accept bulk update with list of document.

You need to register the keyword via add_keyword method before adding documents.

A document whose location (pos) is out of range of the current window cannot be registered.

window get_result(0: string keyword)
Parameters:
  • keyword – keyword to get burst detection result
Returns:

burst detection result

Returns the burst detection result of the current window for pre-registered keyword keyword.

window get_result_at(0: string keyword, 1: double pos)
Parameters:
  • keyword – keyword to get burst detection result
  • pos – position
Returns:

burst detection result

Returns the burst detection result at the specified position pos for pre-registered keyword keyword.

map<string, window> get_all_bursted_results()
Returns:pairs of keyword and its burst detection result

Returns the burst detection result of the current window for all pre-registered keywords.

map<string, window> get_all_bursted_results_at(0: double pos)
Parameters:
  • pos – position
Returns:

pairs of keyword and its burst detection result

Returns the burst detection result at the specified position pos for all pre-registered keywords.

list<keyword_with_params> get_all_keywords()
Returns:list of keyword and its parameters

Returns the list of keywords registered for burst detection.

bool add_keyword(0: keyword_with_params keyword)
Parameters:
  • keyword – keyword and parameters to be added
Returns:

True if Jubatus succeed to add the keyword

Registers the keyword keyword for burst detection.

bool remove_keyword(0: string keyword)
Parameters:
  • keyword – keyword to be removed
Returns:

True if Jubatus succeed to delete the keyword

Removes the keyword keyword from burst detection.

bool remove_all_keywords()
Returns:True if Jubatus succeed to delete keywords

Removes all the keywords from burst detection.