jubakit package¶
jubakit.anomaly module¶
-
class
jubakit.anomaly.
Anomaly
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Anomaly service.
-
add
(dataset)[source]¶ Adds data points to the anomaly model using the given dataset and returns LOF scores.
-
add_bulk
(dataset)[source]¶ Adds data points to the anomaly model using the given dataset and returns a list of data point IDs.
-
-
class
jubakit.anomaly.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configuration to run Anomaly service.
-
class
jubakit.anomaly.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Anomaly service.
-
class
jubakit.anomaly.
Schema
(mapping, fallback=None)[source]¶ Bases:
jubakit.base.GenericSchema
Schema for Anomaly service.
-
FLAG
= u'f'¶
-
ID
= u'i'¶
-
jubakit.bandit module¶
-
class
jubakit.bandit.
Bandit
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Bandit service.
-
class
jubakit.bandit.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configuration to run Bandit service.
jubakit.base module¶
-
class
jubakit.base.
BaseConfig
(*args, **kwargs)[source]¶ Bases:
dict
Config is a convenient class to build new config.
-
class
jubakit.base.
BaseDataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
object
Dataset is an abstract representation of set of data.
-
__init__
(loader, schema=None, static=None, _data=None)[source]¶ Defines a new dataset. Datasets are immutable and cannot be modified.
Data will be loaded from the given loader using schema.
When static is set to True (which is the default for non-infinite loaders), data will be loaded on memory immedeately; otherwise data will be loaded one-by-one from loader, which may be better when processing a large dataset. For “infinite” loaders (like MQ and Twitter stream), static cannot be set to True. Note that some features (e.g., index access) are not available for non-static datasets, which may be needed for some features like cross-validation etc.
-
-
class
jubakit.base.
BaseLoader
[source]¶ Bases:
object
Loader loads rows from various data sources.
-
class
jubakit.base.
BaseSchema
(mapping, fallback=None)[source]¶ Bases:
object
Schema defines data types for each key of the data.
BaseSchema defines the fundamental 3 data types.
- IGNORE: ignores the key (mainly intended for fallback)
- AUTO: use the type of the key as its data type
- INFER: guess the type of the key from its value; note that this is
- discouraged as it may result in unstable result.
-
AUTO
= u'.'¶
-
IGNORE
= u'_'¶
-
INFER
= u'?'¶
-
class
jubakit.base.
BaseService
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
object
Service provides an interface to machine learning features.
-
__init__
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Creates a new service that connects to the exsiting server.
-
get_status
()[source]¶ Returns the status of this server. In distributed mode, returns statuses of all members.
-
load
(name, path=None)[source]¶ Loads the model using name. If path is specified, copy the model file from local path to remote location.
-
classmethod
name
()[source]¶ Subclasses (Classifier, NearestNeighbor, … etc.) must override this method and return its service name (classifier, nearest_neighbor, … etc.)
-
classmethod
run
(config, port=None, embedded=False)[source]¶ Runs a new standalone server or embedded instance and returns the service instance.
-
-
class
jubakit.base.
GenericConfig
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.BaseConfig
GenericConfig is a base Config class for generic services that have converter, method and parameter in its config data.
-
class
jubakit.base.
GenericSchema
(mapping, fallback=None)[source]¶ Bases:
jubakit.base.BaseSchema
GenericSchema is a base Schema class for all engines using Datum.
GenericSchema defines 3 data types:
- STRING: string features (string_values)
- NUMBER: numeric features (num_values)
- BINARY: binary features (binary_values)
-
BINARY
= u'b'¶
-
NUMBER
= u'n'¶
-
STRING
= u's'¶
jubakit.burst module¶
-
class
jubakit.burst.
Burst
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Burst service.
-
DEFAULT_GAMMA
= 0.1¶
-
DEFAULT_SCALING
= 1.1¶
-
get_all_bursted_results
()[source]¶ Returns the burst detection result of the current window for all pre-registered keywords.
-
get_all_bursted_results_at
(pos)[source]¶ Returns the burst detection result at the specified position for all pre-registered keywords.
-
get_result
(keyword)[source]¶ Returns the burst detection result of the current window for pre-registered keyword keyword.
-
-
class
jubakit.burst.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configurations to run Burst service.
-
class
jubakit.burst.
DocumentDataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Document dataset for Burst service.
-
class
jubakit.burst.
DocumentSchema
(mapping, fallback=None)[source]¶ Bases:
jubakit.base.GenericSchema
Document schema for Burst service.
-
POSITION
= u'p'¶
-
TEXT
= u't'¶
-
-
class
jubakit.burst.
KeywordDataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Keyword dataset for Burst service.
jubakit.classifier module¶
-
class
jubakit.classifier.
Classifier
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Classifier service.
-
class
jubakit.classifier.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configuration to run Classifier service.
-
class
jubakit.classifier.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Classifier service.
-
classmethod
from_array
(data, labels=None, feature_names=None, label_names=None, static=True)[source]¶ Converts two arrays (data and its associated labels) to Dataset.
data : array of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
-
classmethod
from_data
(data, labels=None, feature_names=None, label_names=None, static=True)[source]¶ Converts two arrays or a sparse matrix data and its associated label array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
-
classmethod
from_matrix
(data, labels=None, feature_names=None, label_names=None, static=True)[source]¶ Converts a sparse matrix data and its associated label array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
-
classmethod
jubakit.clustering module¶
-
class
jubakit.clustering.
Clustering
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Clustering service.
-
get_nearest_center
(dataset)[source]¶ Returns nearest cluster center without adding points to cluster.
-
-
class
jubakit.clustering.
Config
(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configulation to run Clustering service.
-
class
jubakit.clustering.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Clustering service.
-
classmethod
from_array
(data, ids=None, feature_names=None, static=True)[source]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_data
(data, ids=None, feature_names=None, static=True)[source]¶ Converts two arrays or a sparse matrix data and its associated id array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_matrix
(data, ids=None, feature_names=None, static=True)[source]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
jubakit.compat module¶
jubakit.dumb module¶
Dumb Service is a kind of temporary implementations of Services. They are defined just for convenience.
Unlike Real Services (Classifier, Anomaly, …) which are defined in each file (classifier.py, anomaly.py, …), Dumb Services cannot handle Datasets and Schemas.
Each service has a field called CONFIG
, which provides a default
config data structure for the service. So you can use jubakit to start
a Jubatus server processe, then directly use the raw Client class to
make RPC calls.
>>> from jubakit.dumb import Stat
>>> service = Stat.run(Stat.CONFIG)
>>> client = service._client()
>>> client.push('x', 12)
-
class
jubakit.dumb.
Bandit
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'method': u'ucb1', u'parameter': {u'assume_unrewarded': False}}¶
-
-
class
jubakit.dumb.
Burst
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'method': u'burst', u'parameter': {u'costcut_threshold': -1, u'window_batch_size': 5, u'max_reuse_batch_num': 5, u'result_window_rotate_size': 5, u'batch_interval': 10}}¶
-
-
class
jubakit.dumb.
Clustering
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'compressor_method': u'simple', u'compressor_parameter': {u'bucket_size': 1000}, u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'distance': u'euclidean', u'method': u'kmeans', u'parameter': {u'k': 3, u'seed': 0}}¶
-
-
class
jubakit.dumb.
Graph
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'method': u'graph_wo_index', u'parameter': {u'damping_factor': 0.9, u'landmark_num': 5}}¶
-
-
class
jubakit.dumb.
NearestNeighbor
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'lsh', u'parameter': {u'hash_num': 64}}¶
-
-
class
jubakit.dumb.
Recommender
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'inverted_index'}¶
-
-
class
jubakit.dumb.
Regression
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
-
CONFIG
= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'PA1', u'parameter': {u'sensitivity': 0.1, u'regularization_weight': 3.402823e+38}}¶
-
jubakit.logger module¶
jubakit.model module¶
This module provides features to manipulate model files.
-
class
jubakit.model.
GenericTransformer
(_m)[source]¶ Bases:
jubakit.model.BaseTransformer
Transformation for services having generic 2-element model data structure (service model and weight manager model). It can be converted to Weight model.
-
class
jubakit.model.
JubaDump
[source]¶ Bases:
object
JubaDump
provides a high-level dump of Jubatus models.jubadump
command must be installed.
-
class
jubakit.model.
JubaModel
[source]¶ Bases:
object
JubaModel
provides features to perform low-level manipulation of Jubatus model data structure.-
data
()[source]¶ Returns the actual model data part. This method is a quick shortcut for
return self.user.user_data
.
-
classmethod
load_binary
(f, validate=True)[source]¶ Loads Jubatus binary model file from binary stream
f
. Whenvalidate
isTrue
, the model file format is strictly validated.
-
jubakit.nearest_neighbor module¶
-
class
jubakit.nearest_neighbor.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configuration to run Nearest Neighbor service.
-
class
jubakit.nearest_neighbor.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Nearest Neighbor service.
-
classmethod
from_array
(data, ids=None, feature_names=None, static=True)[source]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_data
(data, ids=None, feature_names=None, static=True)[source]¶ Converts two arrays or a sparse matrix data and its associated id array to Dataset.
- data : array or scipy 2-D sparse matrix of shape
- [n_samples, n_features]
ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_matrix
(data, ids=None, feature_names=None, static=True)[source]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
-
class
jubakit.nearest_neighbor.
NearestNeighbor
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Nearest Neighbor service.
-
neighbor_row_from_datum
(dataset, size=10)[source]¶ Returns size rows (at maximum) of which datum are most similar to query and their distance values.
-
neighbor_row_from_id
(dataset, size=10)[source]¶ Returns size rows (at maximum) that have most similar datum to id and their distance values.
-
set_row
(dataset)[source]¶ Updates the row whose id is id with given row. If the row with the same id already exists, the row is overwritten with row (note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.
-
-
class
jubakit.nearest_neighbor.
Schema
(mapping, fallback=None)[source]¶ Bases:
jubakit.base.GenericSchema
Schema for Nearest Neighbor service.
-
ID
= u'i'¶
-
jubakit.recommender module¶
-
class
jubakit.recommender.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configuration to run Recommender service.
-
class
jubakit.recommender.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Recommender service.
-
class
jubakit.recommender.
Recommender
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Recommender service.
-
complete_row_from_datum
(dataset)[source]¶ Returns data points from the datum in the recommender model, with missing value completed by predicted value.
-
complete_row_from_id
(dataset)[source]¶ Returns data points from the row id in the recommender model, with missing value completed by predicted value.
-
similar_row_from_datum
(dataset, size=10)[source]¶ Returns similar data points from the datum in the recommender model.
-
similar_row_from_datum_and_rate
(dataset, rate=0.1)[source]¶ Returns the top rate of all the rows which are most similar to row. For example, return the top 10% of all the rows when 0.1 is specified as rate.
The rate must be in (0, 1].
-
similar_row_from_datum_and_score
(dataset, score=0.8)[source]¶ Returns rows which are most similar to row and have a greater similarity score than score.
-
similar_row_from_id
(dataset, size=10)[source]¶ Returns similar data points from the row id in the recommender model.
-
similar_row_from_id_and_rate
(dataset, rate=0.1)[source]¶ Returns the top rate of all the rows which are most similar to the row id. For example, return the top 10% of all the rows when 0.1 is specified as rate.
The rate must be in (0, 1].
-
jubakit.regression module¶
-
class
jubakit.regression.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configulation to run Classifier service.
-
class
jubakit.regression.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Regression service.
-
classmethod
from_array
(data, targets=None, feature_names=None, static=True)[source]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_data
(data, targets=None, feature_names=None, static=True)[source]¶ Converts two arrays or a sparse matrix data and its associated target array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_matrix
(data, targets=None, feature_names=None, static=True)[source]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
-
class
jubakit.regression.
Regression
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Regression service.
jubakit.shell module¶
-
class
jubakit.shell.
JubaShell
(host, port, cluster, service, **kwargs)[source]¶ Bases:
object
JubaShell provides a shell environment to call Jubatus RPC API.
The interactive interface is provided in
cli
submodule.-
__init__
(host, port, cluster, service, **kwargs)[source]¶ Creates a new shell environment using parameters specified.
If
service
isNone
, it will be automatically probed.
-
connect
()[source]¶ Discard the current connection (if connected) and create new client instance. Note that TCP connection will not be established until RPC method is called.
-
is_connected
()[source]¶ Returns True if the client exists. Note that its backend TCP connection may already be closed.
-
jubakit.weight module¶
-
class
jubakit.weight.
Config
(method=None, parameter=None, converter=None)[source]¶ Bases:
jubakit.base.GenericConfig
Configuration to run Weight service.
-
class
jubakit.weight.
Dataset
(loader, schema=None, static=None, _data=None)[source]¶ Bases:
jubakit.base.BaseDataset
Dataset for Weight service.
-
class
jubakit.weight.
Schema
(mapping, fallback=None)[source]¶ Bases:
jubakit.base.GenericSchema
Schema for Weight service.
-
class
jubakit.weight.
Weight
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶ Bases:
jubakit.base.BaseService
Weight service.