jubakit package¶
jubakit.anomaly module¶
- 
class 
jubakit.anomaly.Anomaly(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceAnomaly service.
- 
add(dataset)[ソース]¶ Adds data points to the anomaly model using the given dataset and returns LOF scores.
- 
add_bulk(dataset)[ソース]¶ Adds data points to the anomaly model using the given dataset and returns a list of data point IDs.
- 
 
- 
class 
jubakit.anomaly.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfiguration to run Anomaly service.
- 
class 
jubakit.anomaly.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Anomaly service.
- 
class 
jubakit.anomaly.Schema(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchemaSchema for Anomaly service.
- 
FLAG= u'f'¶ 
- 
ID= u'i'¶ 
- 
 
jubakit.bandit module¶
- 
class 
jubakit.bandit.Bandit(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceBandit service.
- 
class 
jubakit.bandit.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfiguration to run Bandit service.
jubakit.base module¶
- 
class 
jubakit.base.BaseConfig(*args, **kwargs)[ソース]¶ ベースクラス:
dictConfig is a convenient class to build new config.
- 
class 
jubakit.base.BaseDataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
objectDataset is an abstract representation of set of data.
- 
__init__(loader, schema=None, static=None, _data=None)[ソース]¶ Defines a new dataset. Datasets are immutable and cannot be modified.
Data will be loaded from the given loader using schema.
When static is set to True (which is the default for non-infinite loaders), data will be loaded on memory immedeately; otherwise data will be loaded one-by-one from loader, which may be better when processing a large dataset. For 「infinite」 loaders (like MQ and Twitter stream), static cannot be set to True. Note that some features (e.g., index access) are not available for non-static datasets, which may be needed for some features like cross-validation etc.
- 
 
- 
class 
jubakit.base.BaseLoader[ソース]¶ ベースクラス:
objectLoader loads rows from various data sources.
- 
class 
jubakit.base.BaseSchema(mapping, fallback=None)[ソース]¶ ベースクラス:
objectSchema defines data types for each key of the data.
BaseSchema defines the fundamental 3 data types.
- IGNORE: ignores the key (mainly intended for fallback)
 - AUTO: use the type of the key as its data type
 - INFER: guess the type of the key from its value; note that this is
 - discouraged as it may result in unstable result.
 
- 
AUTO= u'.'¶ 
- 
IGNORE= u'_'¶ 
- 
INFER= u'?'¶ 
- 
class 
jubakit.base.BaseService(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
objectService provides an interface to machine learning features.
- 
__init__(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ Creates a new service that connects to the exsiting server.
- 
get_status()[ソース]¶ Returns the status of this server. In distributed mode, returns statuses of all members.
- 
load(name, path=None)[ソース]¶ Loads the model using name. If path is specified, copy the model file from local path to remote location.
- 
classmethod 
name()[ソース]¶ Subclasses (Classifier, NearestNeighbor, … etc.) must override this method and return its service name (classifier, nearest_neighbor, … etc.)
- 
classmethod 
run(config, port=None, embedded=False)[ソース]¶ Runs a new standalone server or embedded instance and returns the service instance.
- 
 
- 
class 
jubakit.base.GenericConfig(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.BaseConfigGenericConfig is a base Config class for generic services that have converter, method and parameter in its config data.
- 
class 
jubakit.base.GenericSchema(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.BaseSchemaGenericSchema is a base Schema class for all engines using Datum.
GenericSchema defines 3 data types:
- STRING: string features (string_values)
 - NUMBER: numeric features (num_values)
 - BINARY: binary features (binary_values)
 
- 
BINARY= u'b'¶ 
- 
NUMBER= u'n'¶ 
- 
STRING= u's'¶ 
jubakit.burst module¶
- 
class 
jubakit.burst.Burst(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceBurst service.
- 
DEFAULT_GAMMA= 0.1¶ 
- 
DEFAULT_SCALING= 1.1¶ 
- 
get_all_bursted_results()[ソース]¶ Returns the burst detection result of the current window for all pre-registered keywords.
- 
get_all_bursted_results_at(pos)[ソース]¶ Returns the burst detection result at the specified position for all pre-registered keywords.
- 
get_result(keyword)[ソース]¶ Returns the burst detection result of the current window for pre-registered keyword keyword.
- 
 
- 
class 
jubakit.burst.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfigurations to run Burst service.
- 
class 
jubakit.burst.DocumentDataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDocument dataset for Burst service.
- 
class 
jubakit.burst.DocumentSchema(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchemaDocument schema for Burst service.
- 
POSITION= u'p'¶ 
- 
TEXT= u't'¶ 
- 
 
- 
class 
jubakit.burst.KeywordDataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetKeyword dataset for Burst service.
jubakit.classifier module¶
- 
class 
jubakit.classifier.Classifier(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceClassifier service.
- 
class 
jubakit.classifier.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfiguration to run Classifier service.
- 
class 
jubakit.classifier.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Classifier service.
- 
classmethod 
from_array(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated labels) to Dataset.
data : array of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
- 
classmethod 
from_data(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated label array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
- 
classmethod 
from_matrix(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated label array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
- 
classmethod 
 
jubakit.clustering module¶
- 
class 
jubakit.clustering.Clustering(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceClustering service.
- 
class 
jubakit.clustering.Config(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfigulation to run Clustering service.
- 
class 
jubakit.clustering.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Clustering service.
- 
classmethod 
from_array(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
from_data(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated id array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
from_matrix(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
 
jubakit.compat module¶
jubakit.dumb module¶
Dumb Service is a kind of temporary implementations of Services. They are defined just for convenience.
Unlike Real Services (Classifier, Anomaly, …) which are defined in each file (classifier.py, anomaly.py, …), Dumb Services cannot handle Datasets and Schemas.
Each service has a field called CONFIG, which provides a default
config data structure for the service.  So you can use jubakit to start
a Jubatus server processe, then directly use the raw Client class to
make RPC calls.
>>> from jubakit.dumb import Stat
>>> service = Stat.run(Stat.CONFIG)
>>> client = service._client()
>>> client.push('x', 12)
- 
class 
jubakit.dumb.Bandit(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'method': u'ucb1', u'parameter': {u'assume_unrewarded': False}}¶ 
- 
 
- 
class 
jubakit.dumb.Burst(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'method': u'burst', u'parameter': {u'costcut_threshold': -1, u'window_batch_size': 5, u'max_reuse_batch_num': 5, u'result_window_rotate_size': 5, u'batch_interval': 10}}¶ 
- 
 
- 
class 
jubakit.dumb.Clustering(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'compressor_method': u'simple', u'compressor_parameter': {u'bucket_size': 1000}, u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'distance': u'euclidean', u'method': u'kmeans', u'parameter': {u'k': 3, u'seed': 0}}¶ 
- 
 
- 
class 
jubakit.dumb.Graph(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'method': u'graph_wo_index', u'parameter': {u'damping_factor': 0.9, u'landmark_num': 5}}¶ 
- 
 
- 
class 
jubakit.dumb.NearestNeighbor(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'lsh', u'parameter': {u'hash_num': 64}}¶ 
- 
 
- 
class 
jubakit.dumb.Recommender(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'inverted_index'}¶ 
- 
 
- 
class 
jubakit.dumb.Regression(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService- 
CONFIG= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'PA1', u'parameter': {u'sensitivity': 0.1, u'regularization_weight': 3.402823e+38}}¶ 
- 
 
jubakit.logger module¶
jubakit.model module¶
This module provides features to manipulate model files.
- 
class 
jubakit.model.AnomalyTransformer(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
- 
class 
jubakit.model.ClassifierTransformer(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
- 
class 
jubakit.model.ClusteringTransformer(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
- 
class 
jubakit.model.GenericTransformer(_m)[ソース]¶ ベースクラス:
jubakit.model.BaseTransformerTransformation for services having generic 2-element model data structure (service model and weight manager model). It can be converted to Weight model.
- 
class 
jubakit.model.JubaDump[ソース]¶ ベースクラス:
objectJubaDumpprovides a high-level dump of Jubatus models.jubadumpcommand must be installed.
- 
class 
jubakit.model.JubaModel[ソース]¶ ベースクラス:
objectJubaModelprovides features to perform low-level manipulation of Jubatus model data structure.- 
data()[ソース]¶ Returns the actual model data part. This method is a quick shortcut for
return self.user.user_data.
- 
classmethod 
load_binary(f, validate=True)[ソース]¶ Loads Jubatus binary model file from binary stream
f. WhenvalidateisTrue, the model file format is strictly validated.
- 
 
- 
class 
jubakit.model.RecommenderTransformer(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
jubakit.nearest_neighbor module¶
- 
class 
jubakit.nearest_neighbor.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfiguration to run Nearest Neighbor service.
- 
class 
jubakit.nearest_neighbor.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Nearest Neighbor service.
- 
classmethod 
from_array(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
from_data(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated id array to Dataset.
- data : array or scipy 2-D sparse matrix of shape
 - [n_samples, n_features]
 
ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
from_matrix(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
 
- 
class 
jubakit.nearest_neighbor.NearestNeighbor(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceNearest Neighbor service.
- 
neighbor_row_from_datum(dataset, size=10)[ソース]¶ Returns size rows (at maximum) of which datum are most similar to query and their distance values.
- 
neighbor_row_from_id(dataset, size=10)[ソース]¶ Returns size rows (at maximum) that have most similar datum to id and their distance values.
- 
set_row(dataset)[ソース]¶ Updates the row whose id is id with given row. If the row with the same id already exists, the row is overwritten with row (note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.
- 
 
- 
class 
jubakit.nearest_neighbor.Schema(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchemaSchema for Nearest Neighbor service.
- 
ID= u'i'¶ 
- 
 
jubakit.recommender module¶
- 
class 
jubakit.recommender.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfiguration to run Recommender service.
- 
class 
jubakit.recommender.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Recommender service.
- 
class 
jubakit.recommender.Recommender(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceRecommender service.
- 
complete_row_from_datum(dataset)[ソース]¶ Returns data points from the datum in the recommender model, with missing value completed by predicted value.
- 
complete_row_from_id(dataset)[ソース]¶ Returns data points from the row id in the recommender model, with missing value completed by predicted value.
- 
similar_row_from_datum(dataset, size=10)[ソース]¶ Returns similar data points from the datum in the recommender model.
- 
similar_row_from_datum_and_rate(dataset, rate=0.1)[ソース]¶ Returns the top rate of all the rows which are most similar to row. For example, return the top 10% of all the rows when 0.1 is specified as rate.
The rate must be in (0, 1].
- 
similar_row_from_datum_and_score(dataset, score=0.8)[ソース]¶ Returns rows which are most similar to row and have a greater similarity score than score.
- 
similar_row_from_id(dataset, size=10)[ソース]¶ Returns similar data points from the row id in the recommender model.
- 
similar_row_from_id_and_rate(dataset, rate=0.1)[ソース]¶ Returns the top rate of all the rows which are most similar to the row id. For example, return the top 10% of all the rows when 0.1 is specified as rate.
The rate must be in (0, 1].
- 
 
jubakit.regression module¶
- 
class 
jubakit.regression.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfigulation to run Classifier service.
- 
class 
jubakit.regression.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Regression service.
- 
classmethod 
from_array(data, targets=None, feature_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
from_data(data, targets=None, feature_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated target array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
from_matrix(data, targets=None, feature_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
- 
classmethod 
 
- 
class 
jubakit.regression.Regression(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceRegression service.
jubakit.shell module¶
- 
class 
jubakit.shell.JubaShell(host, port, cluster, service, **kwargs)[ソース]¶ ベースクラス:
objectJubaShell provides a shell environment to call Jubatus RPC API.
The interactive interface is provided in
clisubmodule.- 
__init__(host, port, cluster, service, **kwargs)[ソース]¶ Creates a new shell environment using parameters specified.
If
serviceisNone, it will be automatically probed.
- 
connect()[ソース]¶ Discard the current connection (if connected) and create new client instance. Note that TCP connection will not be established until RPC method is called.
- 
is_connected()[ソース]¶ Returns True if the client exists. Note that its backend TCP connection may already be closed.
- 
 
- 
exception 
jubakit.shell.JubaShellAssertionError[ソース]¶ ベースクラス:
jubakit.shell.JubaShellException
- 
exception 
jubakit.shell.JubaShellRPCError(msg, host, port, e=None)[ソース]¶ ベースクラス:
jubakit.shell.JubaShellException
jubakit.weight module¶
- 
class 
jubakit.weight.Config(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfigConfiguration to run Weight service.
- 
class 
jubakit.weight.Dataset(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDatasetDataset for Weight service.
- 
class 
jubakit.weight.Schema(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchemaSchema for Weight service.
- 
class 
jubakit.weight.Weight(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseServiceWeight service.