jubakit package

jubakit.anomaly module

class jubakit.anomaly.Anomaly(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Anomaly service.

add(dataset)[ソース]

Adds data points to the anomaly model using the given dataset and returns LOF scores.

add_bulk(dataset)[ソース]

Adds data points to the anomaly model using the given dataset and returns a list of data point IDs.

calc_score(dataset)[ソース]

Calculates LOF scores for the given dataset.

classmethod name()[ソース]
overwrite(dataset)[ソース]

Overwrites data points in the anomaly model using the given dataset and returns LOF scores.

update(dataset)[ソース]

Updates data points in the anomaly model using the given dataset and returns LOF scores.

class jubakit.anomaly.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configuration to run Anomaly service.

classmethod methods()[ソース]
class jubakit.anomaly.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Anomaly service.

class jubakit.anomaly.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Anomaly service.

FLAG = u'f'
ID = u'i'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

Anomaly schema transforms the row into Datum, its associated ID and flag. Flag can be a value of any type. It is provided for convenience to calculate precision.

jubakit.bandit module

class jubakit.bandit.Bandit(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Bandit service.

delete_arm(arm_id)[ソース]
get_arm_info(player_id)[ソース]
classmethod name()[ソース]
register_arm(arm_id)[ソース]
register_reward(player_id, arm_id, reward)[ソース]
reset(player_id)[ソース]
select_arm(player_id)[ソース]
class jubakit.bandit.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configuration to run Bandit service.

classmethod methods()[ソース]

jubakit.base module

class jubakit.base.BaseConfig(*args, **kwargs)[ソース]

ベースクラス: dict

Config is a convenient class to build new config.

__init__(*args, **kwargs)[ソース]

Creates a new Config with default configuration.

classmethod default()[ソース]

Returns a new default configuration.

class jubakit.base.BaseDataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: object

Dataset is an abstract representation of set of data.

__init__(loader, schema=None, static=None, _data=None)[ソース]

Defines a new dataset. Datasets are immutable and cannot be modified.

Data will be loaded from the given loader using schema.

When static is set to True (which is the default for non-infinite loaders), data will be loaded on memory immedeately; otherwise data will be loaded one-by-one from loader, which may be better when processing a large dataset. For 「infinite」 loaders (like MQ and Twitter stream), static cannot be set to True. Note that some features (e.g., index access) are not available for non-static datasets, which may be needed for some features like cross-validation etc.

convert(func)[ソース]

Applies the given callable (which is expected to perform batch pre-processing like shuffle) to the whole data entries and returns a new immutable Dataset.

get(idx)[ソース]

Returns the raw entry loaded by Loader.

get_schema()[ソース]

Returns the Schema for this dataset.

is_static()[ソース]

Returns True for static datasets.

shuffle(seed=None)[ソース]

Returns a new immutable Dataset whose records are shuffled.

class jubakit.base.BaseLoader[ソース]

ベースクラス: object

Loader loads rows from various data sources.

is_infinite()[ソース]

Returns True if the length of the data source is indeterminate (e.g., MQ.)

preprocess(ent)[ソース]

Preprocesses the given dict-like object into another dict-like object. The default implementation does not alter the object. Users can override this method to perform custom process. You can yield None to skip the record.

rows()[ソース]

Subclasses must override this method and yield each row of data source in flat dict-like object. You can yield None to skip the record.

class jubakit.base.BaseSchema(mapping, fallback=None)[ソース]

ベースクラス: object

Schema defines data types for each key of the data.

BaseSchema defines the fundamental 3 data types.

  • IGNORE: ignores the key (mainly intended for fallback)
  • AUTO: use the type of the key as its data type
  • INFER: guess the type of the key from its value; note that this is
    discouraged as it may result in unstable result.
AUTO = u'.'
IGNORE = u'_'
INFER = u'?'
__init__(mapping, fallback=None)[ソース]

Defines a Schema. Schema is an immutable object and cannot be modified. mapping is a dict-like object that maps row keys to the data type. Optionally you can assign an alias name for the key to handle different loaders with the same configuration.

classmethod predict(row, typed)[ソース]

Predicts a Schema from dict-like row object.

transform(row)[ソース]

Transform the row (dict-like) into data structures required by the corresponding Service.

class jubakit.base.BaseService(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: object

Service provides an interface to machine learning features.

__init__(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

Creates a new service that connects to the exsiting server.

clear()[ソース]

Clears the model.

get_status()[ソース]

Returns the status of this server. In distributed mode, returns statuses of all members.

load(name, path=None)[ソース]

Loads the model using name. If path is specified, copy the model file from local path to remote location.

classmethod name()[ソース]

Subclasses (Classifier, NearestNeighbor, … etc.) must override this method and return its service name (classifier, nearest_neighbor, … etc.)

classmethod run(config, port=None, embedded=False)[ソース]

Runs a new standalone server or embedded instance and returns the service instance.

save(name, path=None)[ソース]

Saves the model using name. If path is specified, copy the saved model file to local path.

shell(**kwargs)[ソース]

Starts an interactive shell session for this service.

stop()[ソース]

Stops the backend process if exists.

class jubakit.base.GenericConfig(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.BaseConfig

GenericConfig is a base Config class for generic services that have converter, method and parameter in its config data.

__init__(method=None, parameter=None, converter=None)[ソース]
add_mecab(name=u'mecab', arg=u'', ngram=1, base=False, include_features=u'*', exclude_features=u'')[ソース]

Add MeCab feature extraction to string_types.

clear_converter()[ソース]

Initialize the converter section of the config with an empty template.

classmethod methods()[ソース]

Subclasses must override this method and return methods available for this service.

class jubakit.base.GenericSchema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.BaseSchema

GenericSchema is a base Schema class for all engines using Datum.

GenericSchema defines 3 data types:

  • STRING: string features (string_values)
  • NUMBER: numeric features (num_values)
  • BINARY: binary features (binary_values)
BINARY = u'b'
NUMBER = u'n'
STRING = u's'
classmethod predict(row, typed)[ソース]

Predicts a schema from dict-like row object.

transform(row)[ソース]

Transforms the row (represented in dict-like object) as Datum. Subclasses that define their own data types should override this method and handle them.

class jubakit.base.Utils[ソース]

ベースクラス: object

static softmax(x)[ソース]

jubakit.burst module

class jubakit.burst.Burst(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Burst service.

DEFAULT_GAMMA = 0.1
DEFAULT_SCALING = 1.1
add_documents(document_dataset)[ソース]

Register the document for burst detection.

add_keyword(keyword_dataset)[ソース]

Registers the keyword for burst detection.

get_all_bursted_results()[ソース]

Returns the burst detection result of the current window for all pre-registered keywords.

get_all_bursted_results_at(pos)[ソース]

Returns the burst detection result at the specified position for all pre-registered keywords.

get_all_keywords()[ソース]

Returns the list of keywords registered for burst detection.

get_result(keyword)[ソース]

Returns the burst detection result of the current window for pre-registered keyword keyword.

get_result_at(keyword, pos)[ソース]

Returns the burst detection result at the specified position for pre-registered keyword.

classmethod name()[ソース]
remove_all_keywords()[ソース]

Removes all the keywords from burst detection.

remove_keyword(keyword)[ソース]

Removes the keyword from burst detection.

class jubakit.burst.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configurations to run Burst service.

__init__(method=None, parameter=None, converter=None)[ソース]
classmethod methods()[ソース]
class jubakit.burst.DocumentDataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Document dataset for Burst service.

class jubakit.burst.DocumentSchema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Document schema for Burst service.

POSITION = u'p'
TEXT = u't'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]
class jubakit.burst.KeywordDataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Keyword dataset for Burst service.

class jubakit.burst.KeywordSchema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Keyword schema for Burst service.

GAMMA = u'g'
KEYWORD = u'k'
SCALING = u's'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

jubakit.classifier module

class jubakit.classifier.Classifier(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Classifier service.

classify(dataset, softmax=False)[ソース]

Classify the given dataset using this classifier. When softmax is set to True, softmax is applied to the resulting scores.

classmethod name()[ソース]
train(dataset)[ソース]

Trains the classifier using the given dataset.

classmethod train_and_classify(config, train_dataset, test_dataset, metric)[ソース]

This is an utility method to perform bulk train-test. Run a classifier using the given config, train the classifier, classify using the classifier, then return the calculated metrics.

class jubakit.classifier.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configuration to run Classifier service.

classmethod methods()[ソース]
class jubakit.classifier.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Classifier service.

classmethod from_array(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]

Converts two arrays (data and its associated labels) to Dataset.

data : array of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional

classmethod from_data(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]

Converts two arrays or a sparse matrix data and its associated label array to Dataset.

data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional

classmethod from_matrix(data, labels=None, feature_names=None, label_names=None, static=True)[ソース]

Converts a sparse matrix data and its associated label array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional

get_labels()[ソース]

Returns labels of each record in the dataset.

class jubakit.classifier.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Classifier service.

LABEL = u'l'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

Classifier schema transforms the row into Datum and its associated label.

jubakit.clustering module

class jubakit.clustering.Clustering(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Clustering service.

get_core_members(light=False)[ソース]

Returns coreset of cluster in datum.

get_k_center()[ソース]

Return k cluster centers.

get_nearest_center(dataset)[ソース]

Returns nearest cluster center without adding points to cluster.

get_nearest_members(dataset, light=False)[ソース]

Returns nearest summary of cluster(coreset) from each point.

get_revision()[ソース]

Return revision of clusters

classmethod name()[ソース]
push(dataset)[ソース]

Add data points.

class jubakit.clustering.Config(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configulation to run Clustering service.

__init__(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[ソース]
classmethod compressor_methods()[ソース]
classmethod distances()[ソース]
classmethod methods()[ソース]
class jubakit.clustering.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Clustering service.

classmethod from_array(data, ids=None, feature_names=None, static=True)[ソース]

Converts two arrays (data and its associated targets) to Dataset.

data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_data(data, ids=None, feature_names=None, static=True)[ソース]

Converts two arrays or a sparse matrix data and its associated id array to Dataset.

data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_matrix(data, ids=None, feature_names=None, static=True)[ソース]

Converts a sparse matrix data and its associated target array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

get_ids()[ソース]

Returns labels of each record in the dataset.

class jubakit.clustering.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Clustering service.

ID = u'i'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

Clustering schema transforms the row into Datum, its associated ID.

jubakit.compat module

jubakit.dumb module

Dumb Service is a kind of temporary implementations of Services. They are defined just for convenience.

Unlike Real Services (Classifier, Anomaly, …) which are defined in each file (classifier.py, anomaly.py, …), Dumb Services cannot handle Datasets and Schemas.

Each service has a field called CONFIG, which provides a default config data structure for the service. So you can use jubakit to start a Jubatus server processe, then directly use the raw Client class to make RPC calls.

>>> from jubakit.dumb import Stat
>>> service = Stat.run(Stat.CONFIG)
>>> client = service._client()
>>> client.push('x', 12)
class jubakit.dumb.Bandit(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'method': u'ucb1', u'parameter': {u'assume_unrewarded': False}}
classmethod name()[ソース]
class jubakit.dumb.Burst(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'method': u'burst', u'parameter': {u'costcut_threshold': -1, u'window_batch_size': 5, u'max_reuse_batch_num': 5, u'result_window_rotate_size': 5, u'batch_interval': 10}}
classmethod name()[ソース]
class jubakit.dumb.Clustering(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'compressor_method': u'simple', u'compressor_parameter': {u'bucket_size': 1000}, u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'distance': u'euclidean', u'method': u'kmeans', u'parameter': {u'k': 3, u'seed': 0}}
classmethod name()[ソース]
class jubakit.dumb.Graph(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'method': u'graph_wo_index', u'parameter': {u'damping_factor': 0.9, u'landmark_num': 5}}
classmethod name()[ソース]
class jubakit.dumb.NearestNeighbor(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'lsh', u'parameter': {u'hash_num': 64}}
classmethod name()[ソース]
class jubakit.dumb.Recommender(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'inverted_index'}
classmethod name()[ソース]
class jubakit.dumb.Regression(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'PA1', u'parameter': {u'sensitivity': 0.1, u'regularization_weight': 3.402823e+38}}
classmethod name()[ソース]
class jubakit.dumb.Stat(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

CONFIG = {u'window_size': 128}
classmethod name()[ソース]

jubakit.logger module

jubakit.logger.get_logger(name=None)[ソース]

Returns the logger. If name is specified, child logger is returned. Otherwise the default jubakit logger is returned.

This is mainly expected for internal uses but users can get logger to print their own logs.

jubakit.logger.setup_logger(level=30, f=<open file '<stderr>', mode 'w'>, log_format=u'[%(name)s] %(asctime)s: (%(levelname)s) %(message)s')[ソース]

Convenient method to setup the logger.

jubakit.model module

This module provides features to manipulate model files.

class jubakit.model.AnomalyTransformer(_m)[ソース]

ベースクラス: jubakit.model.GenericTransformer

transform(service)[ソース]
class jubakit.model.BaseTransformer(_m)[ソース]

ベースクラス: object

__init__(_m)[ソース]
transform(service)[ソース]

Transforms the model into the specified service.

class jubakit.model.ClassifierTransformer(_m)[ソース]

ベースクラス: jubakit.model.GenericTransformer

transform(service)[ソース]
class jubakit.model.ClusteringTransformer(_m)[ソース]

ベースクラス: jubakit.model.GenericTransformer

class jubakit.model.GenericTransformer(_m)[ソース]

ベースクラス: jubakit.model.BaseTransformer

Transformation for services having generic 2-element model data structure (service model and weight manager model). It can be converted to Weight model.

transform(service)[ソース]
exception jubakit.model.InvalidModelFormatError[ソース]

ベースクラス: exceptions.Exception

class jubakit.model.JubaDump[ソース]

ベースクラス: object

JubaDump provides a high-level dump of Jubatus models. jubadump command must be installed.

classmethod dump(data)[ソース]

Returns the dumped model data structure of the raw model data.

classmethod dump_file(target)[ソース]

Returns the dumped model data structure of the model file path target.

class jubakit.model.JubaModel[ソース]

ベースクラス: object

JubaModel provides features to perform low-level manipulation of Jubatus model data structure.

class Container[ソース]

ベースクラス: jubakit.model.ModelPart

dump(f)[ソース]
classmethod load(f)[ソース]
class Header[ソース]

ベースクラス: jubakit.model.ModelPart

dump(f, checksum=True)[ソース]
classmethod fields()[ソース]
classmethod load(f)[ソース]
class ModelPart[ソース]

ベースクラス: object

__init__()[ソース]
dump(f, *args, **kwargs)[ソース]
dumps(*args, **kwargs)[ソース]
classmethod fields()[ソース]

Returns the list of (property_name, data_type, default_value).

get()[ソース]
classmethod load(f, *args, **kwargs)[ソース]
classmethod loads(data, *args, **kwargs)[ソース]
set(record)[ソース]
class SystemContainer[ソース]

ベースクラス: jubakit.model.Container

classmethod fields()[ソース]
class UserContainer[ソース]

ベースクラス: jubakit.model.Container

classmethod fields()[ソース]
__init__()[ソース]
data()[ソース]

Returns the actual model data part. This method is a quick shortcut for return self.user.user_data.

dump_binary(f)[ソース]

Dumps the model as Jubatus binary model file to binary stream f.

dump_json(f, without_raw=False)[ソース]

Dumps the model as JSON file to a text stream f.

dump_text(f)[ソース]

Dumps the model as human-readable text format to a text stream f.

fix_header()[ソース]

Repairs the header values.

classmethod load_binary(f, validate=True)[ソース]

Loads Jubatus binary model file from binary stream f. When validate is True, the model file format is strictly validated.

classmethod load_json(f)[ソース]

Loads model file saved as JSON file from text stream f.

classmethod predict_format(filename)[ソース]

Loads the model file named filename. Returns binary or json.

transform(service)[ソース]
exception jubakit.model.JubaModelError(msg, e=None)[ソース]

ベースクラス: exceptions.Exception

__init__(msg, e=None)[ソース]
class jubakit.model.RecommenderTransformer(_m)[ソース]

ベースクラス: jubakit.model.GenericTransformer

transform(service)[ソース]
class jubakit.model.RegressionTransformer(_m)[ソース]

ベースクラス: jubakit.model.ClassifierTransformer

exception jubakit.model.UnsupportedTransformationError(service)[ソース]

ベースクラス: exceptions.Exception

__init__(service)[ソース]

jubakit.nearest_neighbor module

class jubakit.nearest_neighbor.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configuration to run Nearest Neighbor service.

classmethod methods()[ソース]
class jubakit.nearest_neighbor.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Nearest Neighbor service.

classmethod from_array(data, ids=None, feature_names=None, static=True)[ソース]

Converts two arrays (data and its associated targets) to Dataset.

data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_data(data, ids=None, feature_names=None, static=True)[ソース]

Converts two arrays or a sparse matrix data and its associated id array to Dataset.

data : array or scipy 2-D sparse matrix of shape
[n_samples, n_features]

ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_matrix(data, ids=None, feature_names=None, static=True)[ソース]

Converts a sparse matrix data and its associated target array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

get_ids()[ソース]

Returns labels of each record in the dataset.

class jubakit.nearest_neighbor.NearestNeighbor(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Nearest Neighbor service.

get_all_rows()[ソース]

Returns the list of all row IDs.

classmethod name()[ソース]
neighbor_row_from_datum(dataset, size=10)[ソース]

Returns size rows (at maximum) of which datum are most similar to query and their distance values.

neighbor_row_from_id(dataset, size=10)[ソース]

Returns size rows (at maximum) that have most similar datum to id and their distance values.

set_row(dataset)[ソース]

Updates the row whose id is id with given row. If the row with the same id already exists, the row is overwritten with row (note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.

similar_row_from_datum(dataset, size=10)[ソース]

Returns ret_num rows (at maximum) of which datum are most similar to query and their similarity values.

similar_row_from_id(dataset, size=10)[ソース]

Returns ret_num rows (at maximum) that have most similar datum to id and their similarity values.

class jubakit.nearest_neighbor.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Nearest Neighbor service.

ID = u'i'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

Nearest Neighbor schema transforms the row into Datum, its associated ID.

If row_id does not be set, assign uuid as row_id.

jubakit.recommender module

class jubakit.recommender.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configuration to run Recommender service.

classmethod methods()[ソース]
class jubakit.recommender.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Recommender service.

class jubakit.recommender.Recommender(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Recommender service.

clear_row(dataset)[ソース]

Removes the given rows from the recommendation table.

complete_row_from_datum(dataset)[ソース]

Returns data points from the datum in the recommender model, with missing value completed by predicted value.

complete_row_from_id(dataset)[ソース]

Returns data points from the row id in the recommender model, with missing value completed by predicted value.

decode_row(dataset)[ソース]

Returns data points in the row id.

classmethod name()[ソース]
similar_row_from_datum(dataset, size=10)[ソース]

Returns similar data points from the datum in the recommender model.

similar_row_from_datum_and_rate(dataset, rate=0.1)[ソース]

Returns the top rate of all the rows which are most similar to row. For example, return the top 10% of all the rows when 0.1 is specified as rate.

The rate must be in (0, 1].

similar_row_from_datum_and_score(dataset, score=0.8)[ソース]

Returns rows which are most similar to row and have a greater similarity score than score.

similar_row_from_id(dataset, size=10)[ソース]

Returns similar data points from the row id in the recommender model.

similar_row_from_id_and_rate(dataset, rate=0.1)[ソース]

Returns the top rate of all the rows which are most similar to the row id. For example, return the top 10% of all the rows when 0.1 is specified as rate.

The rate must be in (0, 1].

similar_row_from_id_and_score(dataset, score=0.8)[ソース]

Returns rows which are most similar to the row id and have a greater similarity score than score.

update_row(dataset)[ソース]

Update data points to the recommender model using the given dataset.

class jubakit.recommender.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Recommender service.

ID = u'i'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

Recommender schema transforms the row into Datum, its associated ID.

jubakit.regression module

class jubakit.regression.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configulation to run Classifier service.

classmethod methods()[ソース]
class jubakit.regression.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Regression service.

classmethod from_array(data, targets=None, feature_names=None, static=True)[ソース]

Converts two arrays (data and its associated targets) to Dataset.

data : array of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_data(data, targets=None, feature_names=None, static=True)[ソース]

Converts two arrays or a sparse matrix data and its associated target array to Dataset.

data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_matrix(data, targets=None, feature_names=None, static=True)[ソース]

Converts a sparse matrix data and its associated target array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

class jubakit.regression.Regression(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Regression service.

estimate(dataset)[ソース]

Estimate target values of the given dataset using this Regression.

classmethod name()[ソース]
train(dataset)[ソース]

Trains the regression using the given dataset.

classmethod train_and_estimate(config, train_dataset, test_dataset, metric)[ソース]

This is an utility method to perform bulk train-test. Run a regression using the given config, train the regression, estimate using the regression, then return the calculated metrics.

class jubakit.regression.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Regression service.

TARGET = u't'
__init__(mapping, fallback=None)[ソース]
transform(row)[ソース]

Regression schema transforms the row into Datum and its associated target value.

jubakit.shell module

class jubakit.shell.JubaShell(host, port, cluster, service, **kwargs)[ソース]

ベースクラス: object

JubaShell provides a shell environment to call Jubatus RPC API.

The interactive interface is provided in cli submodule.

__init__(host, port, cluster, service, **kwargs)[ソース]

Creates a new shell environment using parameters specified.

If service is None, it will be automatically probed.

connect()[ソース]

Discard the current connection (if connected) and create new client instance. Note that TCP connection will not be established until RPC method is called.

disconnect()[ソース]

Disconnects from the server (if connected).

classmethod get_cli_classes()[ソース]

Returns map of service name to CLI implementation class.

get_client()[ソース]

Returns the client instance.

classmethod get_client_classes()[ソース]

Returns map of service name to Jubatus client instance.

get_timeout()[ソース]

Returns the current client-side timeout value.

interact()[ソース]

Starts the interactive shell environment.

is_connected()[ソース]

Returns True if the client exists. Note that its backend TCP connection may already be closed.

classmethod probe_facts(host, port, cluster)[ソース]

Probe the service name and remote server type. Returns tuple of (service_name, is_proxy).

run(command)[ソース]

Runs one-shot command.

set_remote(host, port, cluster, service)[ソース]

Switches to the new remote server.

set_timeout(timeout)[ソース]

Sets new client-side timeout value. Existing connection will be discarded.

exception jubakit.shell.JubaShellAssertionError[ソース]

ベースクラス: jubakit.shell.JubaShellException

exception jubakit.shell.JubaShellException[ソース]

ベースクラス: exceptions.Exception

exception jubakit.shell.JubaShellRPCError(msg, host, port, e=None)[ソース]

ベースクラス: jubakit.shell.JubaShellException

__init__(msg, host, port, e=None)[ソース]

jubakit.weight module

class jubakit.weight.Config(method=None, parameter=None, converter=None)[ソース]

ベースクラス: jubakit.base.GenericConfig

Configuration to run Weight service.

classmethod methods()[ソース]
class jubakit.weight.Dataset(loader, schema=None, static=None, _data=None)[ソース]

ベースクラス: jubakit.base.BaseDataset

Dataset for Weight service.

class jubakit.weight.Schema(mapping, fallback=None)[ソース]

ベースクラス: jubakit.base.GenericSchema

Schema for Weight service.

class jubakit.weight.Weight(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[ソース]

ベースクラス: jubakit.base.BaseService

Weight service.

calc_weight(dataset)[ソース]

Returns extracted feature vectors, without modifying the weight model.

classmethod name()[ソース]
update(dataset)[ソース]

Updates the weight using the given dataset and returns extracted feature vectors.