jubakit package¶
jubakit.anomaly module¶
-
class
jubakit.anomaly.
Anomaly
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Anomaly service.
-
add
(dataset)[ソース]¶ Adds data points to the anomaly model using the given dataset and returns LOF scores.
-
add_bulk
(dataset)[ソース]¶ Adds data points to the anomaly model using the given dataset and returns a list of data point IDs.
-
-
class
jubakit.anomaly.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configuration to run Anomaly service.
-
class
jubakit.anomaly.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Anomaly service.
-
class
jubakit.anomaly.
Schema
(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchema
Schema for Anomaly service.
-
FLAG
= u'f'¶
-
ID
= u'i'¶
-
jubakit.bandit module¶
-
class
jubakit.bandit.
Bandit
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Bandit service.
-
class
jubakit.bandit.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configuration to run Bandit service.
jubakit.base module¶
-
class
jubakit.base.
BaseConfig
(*args, **kwargs)[ソース]¶ ベースクラス:
dict
Config is a convenient class to build new config.
-
class
jubakit.base.
BaseDataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
object
Dataset is an abstract representation of set of data.
-
__init__
(loader, schema=None, static=None, _data=None)[ソース]¶ Defines a new dataset. Datasets are immutable and cannot be modified.
Data will be loaded from the given loader using schema.
When static is set to True (which is the default for non-infinite loaders), data will be loaded on memory immedeately; otherwise data will be loaded one-by-one from loader, which may be better when processing a large dataset. For 「infinite」 loaders (like MQ and Twitter stream), static cannot be set to True. Note that some features (e.g., index access) are not available for non-static datasets, which may be needed for some features like cross-validation etc.
-
-
class
jubakit.base.
BaseLoader
[ソース]¶ ベースクラス:
object
Loader loads rows from various data sources.
-
class
jubakit.base.
BaseSchema
(mapping, fallback=None)[ソース]¶ ベースクラス:
object
Schema defines data types for each key of the data.
BaseSchema defines the fundamental 3 data types.
- IGNORE: ignores the key (mainly intended for fallback)
- AUTO: use the type of the key as its data type
- INFER: guess the type of the key from its value; note that this is
- discouraged as it may result in unstable result.
-
AUTO
= u'.'¶
-
IGNORE
= u'_'¶
-
INFER
= u'?'¶
-
class
jubakit.base.
BaseService
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
object
Service provides an interface to machine learning features.
-
__init__
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ Creates a new service that connects to the exsiting server.
-
get_status
()[ソース]¶ Returns the status of this server. In distributed mode, returns statuses of all members.
-
load
(name, path=None)[ソース]¶ Loads the model using name. If path is specified, copy the model file from local path to remote location.
-
classmethod
name
()[ソース]¶ Subclasses (Classifier, NearestNeighbor, … etc.) must override this method and return its service name (classifier, nearest_neighbor, … etc.)
-
classmethod
run
(config, port=None, embedded=False)[ソース]¶ Runs a new standalone server or embedded instance and returns the service instance.
-
-
class
jubakit.base.
GenericConfig
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.BaseConfig
GenericConfig is a base Config class for generic services that have converter, method and parameter in its config data.
-
class
jubakit.base.
GenericSchema
(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.BaseSchema
GenericSchema is a base Schema class for all engines using Datum.
GenericSchema defines 3 data types:
- STRING: string features (string_values)
- NUMBER: numeric features (num_values)
- BINARY: binary features (binary_values)
-
BINARY
= u'b'¶
-
NUMBER
= u'n'¶
-
STRING
= u's'¶
jubakit.burst module¶
-
class
jubakit.burst.
Burst
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Burst service.
-
DEFAULT_GAMMA
= 0.1¶
-
DEFAULT_SCALING
= 1.1¶
-
get_all_bursted_results
()[ソース]¶ Returns the burst detection result of the current window for all pre-registered keywords.
-
get_all_bursted_results_at
(pos)[ソース]¶ Returns the burst detection result at the specified position for all pre-registered keywords.
-
get_result
(keyword)[ソース]¶ Returns the burst detection result of the current window for pre-registered keyword keyword.
-
-
class
jubakit.burst.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configurations to run Burst service.
-
class
jubakit.burst.
DocumentDataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Document dataset for Burst service.
-
class
jubakit.burst.
DocumentSchema
(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchema
Document schema for Burst service.
-
POSITION
= u'p'¶
-
TEXT
= u't'¶
-
-
class
jubakit.burst.
KeywordDataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Keyword dataset for Burst service.
jubakit.classifier module¶
-
class
jubakit.classifier.
Classifier
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Classifier service.
-
class
jubakit.classifier.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configuration to run Classifier service.
-
class
jubakit.classifier.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Classifier service.
-
classmethod
from_array
(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated labels) to Dataset.
data : array of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
-
classmethod
from_data
(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated label array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
-
classmethod
from_matrix
(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated label array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional
-
classmethod
jubakit.clustering module¶
-
class
jubakit.clustering.
Clustering
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Clustering service.
-
class
jubakit.clustering.
Config
(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configulation to run Clustering service.
-
class
jubakit.clustering.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Clustering service.
-
classmethod
from_array
(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_data
(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated id array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_matrix
(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
jubakit.compat module¶
jubakit.dumb module¶
Dumb Service is a kind of temporary implementations of Services. They are defined just for convenience.
Unlike Real Services (Classifier, Anomaly, …) which are defined in each file (classifier.py, anomaly.py, …), Dumb Services cannot handle Datasets and Schemas.
Each service has a field called CONFIG
, which provides a default
config data structure for the service. So you can use jubakit to start
a Jubatus server processe, then directly use the raw Client class to
make RPC calls.
>>> from jubakit.dumb import Stat
>>> service = Stat.run(Stat.CONFIG)
>>> client = service._client()
>>> client.push('x', 12)
-
class
jubakit.dumb.
Bandit
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'method': u'ucb1', u'parameter': {u'assume_unrewarded': False}}¶
-
-
class
jubakit.dumb.
Burst
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'method': u'burst', u'parameter': {u'costcut_threshold': -1, u'window_batch_size': 5, u'max_reuse_batch_num': 5, u'result_window_rotate_size': 5, u'batch_interval': 10}}¶
-
-
class
jubakit.dumb.
Clustering
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'compressor_method': u'simple', u'compressor_parameter': {u'bucket_size': 1000}, u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'distance': u'euclidean', u'method': u'kmeans', u'parameter': {u'k': 3, u'seed': 0}}¶
-
-
class
jubakit.dumb.
Graph
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'method': u'graph_wo_index', u'parameter': {u'damping_factor': 0.9, u'landmark_num': 5}}¶
-
-
class
jubakit.dumb.
NearestNeighbor
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'lsh', u'parameter': {u'hash_num': 64}}¶
-
-
class
jubakit.dumb.
Recommender
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'inverted_index'}¶
-
-
class
jubakit.dumb.
Regression
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
-
CONFIG
= {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'PA1', u'parameter': {u'sensitivity': 0.1, u'regularization_weight': 3.402823e+38}}¶
-
jubakit.logger module¶
jubakit.model module¶
This module provides features to manipulate model files.
-
class
jubakit.model.
AnomalyTransformer
(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
-
class
jubakit.model.
ClassifierTransformer
(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
-
class
jubakit.model.
ClusteringTransformer
(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
-
class
jubakit.model.
GenericTransformer
(_m)[ソース]¶ ベースクラス:
jubakit.model.BaseTransformer
Transformation for services having generic 2-element model data structure (service model and weight manager model). It can be converted to Weight model.
-
class
jubakit.model.
JubaDump
[ソース]¶ ベースクラス:
object
JubaDump
provides a high-level dump of Jubatus models.jubadump
command must be installed.
-
class
jubakit.model.
JubaModel
[ソース]¶ ベースクラス:
object
JubaModel
provides features to perform low-level manipulation of Jubatus model data structure.-
data
()[ソース]¶ Returns the actual model data part. This method is a quick shortcut for
return self.user.user_data
.
-
classmethod
load_binary
(f, validate=True)[ソース]¶ Loads Jubatus binary model file from binary stream
f
. Whenvalidate
isTrue
, the model file format is strictly validated.
-
-
class
jubakit.model.
RecommenderTransformer
(_m)[ソース]¶ ベースクラス:
jubakit.model.GenericTransformer
jubakit.nearest_neighbor module¶
-
class
jubakit.nearest_neighbor.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configuration to run Nearest Neighbor service.
-
class
jubakit.nearest_neighbor.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Nearest Neighbor service.
-
classmethod
from_array
(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_data
(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated id array to Dataset.
- data : array or scipy 2-D sparse matrix of shape
- [n_samples, n_features]
ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_matrix
(data, ids=None, feature_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
-
class
jubakit.nearest_neighbor.
NearestNeighbor
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Nearest Neighbor service.
-
neighbor_row_from_datum
(dataset, size=10)[ソース]¶ Returns size rows (at maximum) of which datum are most similar to query and their distance values.
-
neighbor_row_from_id
(dataset, size=10)[ソース]¶ Returns size rows (at maximum) that have most similar datum to id and their distance values.
-
set_row
(dataset)[ソース]¶ Updates the row whose id is id with given row. If the row with the same id already exists, the row is overwritten with row (note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.
-
-
class
jubakit.nearest_neighbor.
Schema
(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchema
Schema for Nearest Neighbor service.
-
ID
= u'i'¶
-
jubakit.recommender module¶
-
class
jubakit.recommender.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configuration to run Recommender service.
-
class
jubakit.recommender.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Recommender service.
-
class
jubakit.recommender.
Recommender
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Recommender service.
-
complete_row_from_datum
(dataset)[ソース]¶ Returns data points from the datum in the recommender model, with missing value completed by predicted value.
-
complete_row_from_id
(dataset)[ソース]¶ Returns data points from the row id in the recommender model, with missing value completed by predicted value.
-
similar_row_from_datum
(dataset, size=10)[ソース]¶ Returns similar data points from the datum in the recommender model.
-
similar_row_from_datum_and_rate
(dataset, rate=0.1)[ソース]¶ Returns the top rate of all the rows which are most similar to row. For example, return the top 10% of all the rows when 0.1 is specified as rate.
The rate must be in (0, 1].
-
similar_row_from_datum_and_score
(dataset, score=0.8)[ソース]¶ Returns rows which are most similar to row and have a greater similarity score than score.
-
similar_row_from_id
(dataset, size=10)[ソース]¶ Returns similar data points from the row id in the recommender model.
-
similar_row_from_id_and_rate
(dataset, rate=0.1)[ソース]¶ Returns the top rate of all the rows which are most similar to the row id. For example, return the top 10% of all the rows when 0.1 is specified as rate.
The rate must be in (0, 1].
-
jubakit.regression module¶
-
class
jubakit.regression.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configulation to run Classifier service.
-
class
jubakit.regression.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Regression service.
-
classmethod
from_array
(data, targets=None, feature_names=None, static=True)[ソース]¶ Converts two arrays (data and its associated targets) to Dataset.
data : array of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_data
(data, targets=None, feature_names=None, static=True)[ソース]¶ Converts two arrays or a sparse matrix data and its associated target array to Dataset.
data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
from_matrix
(data, targets=None, feature_names=None, static=True)[ソース]¶ Converts a sparse matrix data and its associated target array to Dataset.
data : scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional
-
classmethod
-
class
jubakit.regression.
Regression
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Regression service.
jubakit.shell module¶
-
class
jubakit.shell.
JubaShell
(host, port, cluster, service, **kwargs)[ソース]¶ ベースクラス:
object
JubaShell provides a shell environment to call Jubatus RPC API.
The interactive interface is provided in
cli
submodule.-
__init__
(host, port, cluster, service, **kwargs)[ソース]¶ Creates a new shell environment using parameters specified.
If
service
isNone
, it will be automatically probed.
-
connect
()[ソース]¶ Discard the current connection (if connected) and create new client instance. Note that TCP connection will not be established until RPC method is called.
-
is_connected
()[ソース]¶ Returns True if the client exists. Note that its backend TCP connection may already be closed.
-
-
exception
jubakit.shell.
JubaShellAssertionError
[ソース]¶ ベースクラス:
jubakit.shell.JubaShellException
-
exception
jubakit.shell.
JubaShellRPCError
(msg, host, port, e=None)[ソース]¶ ベースクラス:
jubakit.shell.JubaShellException
jubakit.weight module¶
-
class
jubakit.weight.
Config
(method=None, parameter=None, converter=None)[ソース]¶ ベースクラス:
jubakit.base.GenericConfig
Configuration to run Weight service.
-
class
jubakit.weight.
Dataset
(loader, schema=None, static=None, _data=None)[ソース]¶ ベースクラス:
jubakit.base.BaseDataset
Dataset for Weight service.
-
class
jubakit.weight.
Schema
(mapping, fallback=None)[ソース]¶ ベースクラス:
jubakit.base.GenericSchema
Schema for Weight service.
-
class
jubakit.weight.
Weight
(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]¶ ベースクラス:
jubakit.base.BaseService
Weight service.