jubakit package¶

jubakit.anomaly module¶

class jubakit.anomaly.Anomaly(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Anomaly service.

add(dataset)[source]¶: Adds data points to the anomaly model using the given dataset and returns LOF scores.

add_bulk(dataset)[source]¶: Adds data points to the anomaly model using the given dataset and returns a list of data point IDs.

calc_score(dataset)[source]¶: Calculates LOF scores for the given dataset.

classmethod name()[source]¶

overwrite(dataset)[source]¶: Overwrites data points in the anomaly model using the given dataset and returns LOF scores.

update(dataset)[source]¶: Updates data points in the anomaly model using the given dataset and returns LOF scores.

class jubakit.anomaly.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configuration to run Anomaly service.

classmethod methods()[source]¶

class jubakit.anomaly.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Anomaly service.

class jubakit.anomaly.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Anomaly service.

FLAG = u'f'¶

ID = u'i'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶: Anomaly schema transforms the row into Datum, its associated ID and flag. Flag can be a value of any type. It is provided for convenience to calculate precision.

jubakit.bandit module¶

class jubakit.bandit.Bandit(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Bandit service.

delete_arm(arm_id)[source]¶

get_arm_info(player_id)[source]¶

classmethod name()[source]¶

register_arm(arm_id)[source]¶

register_reward(player_id, arm_id, reward)[source]¶

reset(player_id)[source]¶

select_arm(player_id)[source]¶

class jubakit.bandit.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configuration to run Bandit service.

classmethod methods()[source]¶

jubakit.base module¶

class jubakit.base.BaseConfig(*args, **kwargs)[source]¶

Bases: dict

Config is a convenient class to build new config.

__init__(*args, **kwargs)[source]¶: Creates a new Config with default configuration.

classmethod default()[source]¶: Returns a new default configuration.

class jubakit.base.BaseDataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: object

Dataset is an abstract representation of set of data.

__init__(loader, schema=None, static=None, _data=None)[source]¶

Defines a new dataset. Datasets are immutable and cannot be modified.

Data will be loaded from the given loader using schema.

When static is set to True (which is the default for non-infinite loaders), data will be loaded on memory immedeately; otherwise data will be loaded one-by-one from loader, which may be better when processing a large dataset. For “infinite” loaders (like MQ and Twitter stream), static cannot be set to True. Note that some features (e.g., index access) are not available for non-static datasets, which may be needed for some features like cross-validation etc.

convert(func)[source]¶: Applies the given callable (which is expected to perform batch pre-processing like shuffle) to the whole data entries and returns a new immutable Dataset.

get(idx)[source]¶: Returns the raw entry loaded by Loader.

get_schema()[source]¶: Returns the Schema for this dataset.

is_static()[source]¶: Returns True for static datasets.

shuffle(seed=None)[source]¶: Returns a new immutable Dataset whose records are shuffled.

class jubakit.base.BaseLoader[source]¶

Bases: object

Loader loads rows from various data sources.

is_infinite()[source]¶: Returns True if the length of the data source is indeterminate (e.g., MQ.)

preprocess(ent)[source]¶: Preprocesses the given dict-like object into another dict-like object. The default implementation does not alter the object. Users can override this method to perform custom process. You can yield None to skip the record.

rows()[source]¶: Subclasses must override this method and yield each row of data source in flat dict-like object. You can yield None to skip the record.

class jubakit.base.BaseSchema(mapping, fallback=None)[source]¶

Bases: object

Schema defines data types for each key of the data.

BaseSchema defines the fundamental 3 data types.

IGNORE: ignores the key (mainly intended for fallback)
AUTO: use the type of the key as its data type
INFER: guess the type of the key from its value; note that this is

discouraged as it may result in unstable result.

AUTO = u'.'¶

IGNORE = u'_'¶

INFER = u'?'¶

__init__(mapping, fallback=None)[source]¶: Defines a Schema. Schema is an immutable object and cannot be modified. mapping is a dict-like object that maps row keys to the data type. Optionally you can assign an alias name for the key to handle different loaders with the same configuration.

classmethod predict(row, typed)[source]¶: Predicts a Schema from dict-like row object.

transform(row)[source]¶: Transform the row (dict-like) into data structures required by the corresponding Service.

class jubakit.base.BaseService(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: object

Service provides an interface to machine learning features.

__init__(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶: Creates a new service that connects to the exsiting server.

clear()[source]¶: Clears the model.

get_status()[source]¶: Returns the status of this server. In distributed mode, returns statuses of all members.

load(name, path=None)[source]¶: Loads the model using name. If path is specified, copy the model file from local path to remote location.

classmethod name()[source]¶: Subclasses (Classifier, NearestNeighbor, … etc.) must override this method and return its service name (classifier, nearest_neighbor, … etc.)

classmethod run(config, port=None, embedded=False)[source]¶: Runs a new standalone server or embedded instance and returns the service instance.

save(name, path=None)[source]¶: Saves the model using name. If path is specified, copy the saved model file to local path.

shell(**kwargs)[source]¶: Starts an interactive shell session for this service.

stop()[source]¶: Stops the backend process if exists.

class jubakit.base.GenericConfig(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.BaseConfig

GenericConfig is a base Config class for generic services that have converter, method and parameter in its config data.

__init__(method=None, parameter=None, converter=None)[source]¶

add_mecab(name=u'mecab', arg=u'', ngram=1, base=False, include_features=u'*', exclude_features=u'')[source]¶: Add MeCab feature extraction to string_types.

clear_converter()[source]¶: Initialize the converter section of the config with an empty template.

classmethod methods()[source]¶: Subclasses must override this method and return methods available for this service.

class jubakit.base.GenericSchema(mapping, fallback=None)[source]¶

Bases: jubakit.base.BaseSchema

GenericSchema is a base Schema class for all engines using Datum.

GenericSchema defines 3 data types:

STRING: string features (string_values)
NUMBER: numeric features (num_values)
BINARY: binary features (binary_values)

BINARY = u'b'¶

NUMBER = u'n'¶

STRING = u's'¶

classmethod predict(row, typed)[source]¶: Predicts a schema from dict-like row object.

transform(row)[source]¶: Transforms the row (represented in dict-like object) as Datum. Subclasses that define their own data types should override this method and handle them.

class jubakit.base.Utils[source]¶

Bases: object

static softmax(x)[source]¶

jubakit.burst module¶

class jubakit.burst.Burst(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Burst service.

DEFAULT_GAMMA = 0.1¶

DEFAULT_SCALING = 1.1¶

add_documents(document_dataset)[source]¶: Register the document for burst detection.

add_keyword(keyword_dataset)[source]¶: Registers the keyword for burst detection.

get_all_bursted_results()[source]¶: Returns the burst detection result of the current window for all pre-registered keywords.

get_all_bursted_results_at(pos)[source]¶: Returns the burst detection result at the specified position for all pre-registered keywords.

get_all_keywords()[source]¶: Returns the list of keywords registered for burst detection.

get_result(keyword)[source]¶: Returns the burst detection result of the current window for pre-registered keyword keyword.

get_result_at(keyword, pos)[source]¶: Returns the burst detection result at the specified position for pre-registered keyword.

classmethod name()[source]¶

remove_all_keywords()[source]¶: Removes all the keywords from burst detection.

remove_keyword(keyword)[source]¶: Removes the keyword from burst detection.

class jubakit.burst.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configurations to run Burst service.

__init__(method=None, parameter=None, converter=None)[source]¶

classmethod methods()[source]¶

class jubakit.burst.DocumentDataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Document dataset for Burst service.

class jubakit.burst.DocumentSchema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Document schema for Burst service.

POSITION = u'p'¶

TEXT = u't'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶

class jubakit.burst.KeywordDataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Keyword dataset for Burst service.

class jubakit.burst.KeywordSchema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Keyword schema for Burst service.

GAMMA = u'g'¶

KEYWORD = u'k'¶

SCALING = u's'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶

jubakit.classifier module¶

class jubakit.classifier.Classifier(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Classifier service.

classify(dataset, softmax=False)[source]¶: Classify the given dataset using this classifier. When softmax is set to True, softmax is applied to the resulting scores.

classmethod name()[source]¶

train(dataset)[source]¶: Trains the classifier using the given dataset.

classmethod train_and_classify(config, train_dataset, test_dataset, metric)[source]¶: This is an utility method to perform bulk train-test. Run a classifier using the given config, train the classifier, classify using the classifier, then return the calculated metrics.

class jubakit.classifier.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configuration to run Classifier service.

classmethod methods()[source]¶

class jubakit.classifier.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Classifier service.

classmethod from_array(data, labels=None, feature_names=None, label_names=None, static=True)[source]¶

Converts two arrays (data and its associated labels) to Dataset.

data : array of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional

classmethod from_data(data, labels=None, feature_names=None, label_names=None, static=True)[source]¶

Converts two arrays or a sparse matrix data and its associated label array to Dataset.

data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional

classmethod from_matrix(data, labels=None, feature_names=None, label_names=None, static=True)[source]¶

Converts a sparse matrix data and its associated label array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] labels : array of shape [n_samples], optional feature_names : array of shape [n_features], optional label_names : array of shape [n_labels], optional

get_labels()[source]¶: Returns labels of each record in the dataset.

class jubakit.classifier.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Classifier service.

LABEL = u'l'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶: Classifier schema transforms the row into Datum and its associated label.

jubakit.clustering module¶

class jubakit.clustering.Clustering(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Clustering service.

get_core_members(light=False)[source]¶: Returns coreset of cluster in datum.

get_k_center()[source]¶: Return k cluster centers.

get_nearest_center(dataset)[source]¶: Returns nearest cluster center without adding points to cluster.

get_nearest_members(dataset, light=False)[source]¶: Returns nearest summary of cluster(coreset) from each point.

get_revision()[source]¶: Return revision of clusters

classmethod name()[source]¶

push(dataset)[source]¶: Add data points.

class jubakit.clustering.Config(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[source]¶

Bases: jubakit.base.GenericConfig

Configulation to run Clustering service.

__init__(method=None, parameter=None, compressor_method=None, compressor_parameter=None, converter=None, distance=None)[source]¶

classmethod compressor_methods()[source]¶

classmethod distances()[source]¶

classmethod methods()[source]¶

class jubakit.clustering.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Clustering service.

classmethod from_array(data, ids=None, feature_names=None, static=True)[source]¶

Converts two arrays (data and its associated targets) to Dataset.

data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_data(data, ids=None, feature_names=None, static=True)[source]¶

Converts two arrays or a sparse matrix data and its associated id array to Dataset.

data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_matrix(data, ids=None, feature_names=None, static=True)[source]¶

Converts a sparse matrix data and its associated target array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

get_ids()[source]¶: Returns labels of each record in the dataset.

class jubakit.clustering.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Clustering service.

ID = u'i'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶: Clustering schema transforms the row into Datum, its associated ID.

jubakit.compat module¶

jubakit.dumb module¶

Dumb Service is a kind of temporary implementations of Services. They are defined just for convenience.

Unlike Real Services (Classifier, Anomaly, …) which are defined in each file (classifier.py, anomaly.py, …), Dumb Services cannot handle Datasets and Schemas.

Each service has a field called CONFIG, which provides a default config data structure for the service. So you can use jubakit to start a Jubatus server processe, then directly use the raw Client class to make RPC calls.

>>> from jubakit.dumb import Stat
>>> service = Stat.run(Stat.CONFIG)
>>> client = service._client()
>>> client.push('x', 12)

class jubakit.dumb.Bandit(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'method': u'ucb1', u'parameter': {u'assume_unrewarded': False}}¶

classmethod name()[source]¶

class jubakit.dumb.Burst(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'method': u'burst', u'parameter': {u'costcut_threshold': -1, u'window_batch_size': 5, u'max_reuse_batch_num': 5, u'result_window_rotate_size': 5, u'batch_interval': 10}}¶

classmethod name()[source]¶

class jubakit.dumb.Clustering(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'compressor_method': u'simple', u'compressor_parameter': {u'bucket_size': 1000}, u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'distance': u'euclidean', u'method': u'kmeans', u'parameter': {u'k': 3, u'seed': 0}}¶

classmethod name()[source]¶

class jubakit.dumb.Graph(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'method': u'graph_wo_index', u'parameter': {u'damping_factor': 0.9, u'landmark_num': 5}}¶

classmethod name()[source]¶

class jubakit.dumb.NearestNeighbor(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'lsh', u'parameter': {u'hash_num': 64}}¶

classmethod name()[source]¶

class jubakit.dumb.Recommender(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'inverted_index'}¶

classmethod name()[source]¶

class jubakit.dumb.Regression(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'converter': {u'num_rules': [{u'type': u'num', u'key': u'*'}], u'string_rules': [{u'type': u'bigram', u'sample_weight': u'tf', u'global_weight': u'idf', u'key': u'*'}], u'string_filter_rules': [], u'string_filter_types': {}, u'num_filter_types': {}, u'string_types': {u'trigram': {u'method': u'ngram', u'char_num': u'3'}, u'unigram': {u'method': u'ngram', u'char_num': u'1'}, u'bigram': {u'method': u'ngram', u'char_num': u'2'}}, u'num_types': {}, u'num_filter_rules': []}, u'method': u'PA1', u'parameter': {u'sensitivity': 0.1, u'regularization_weight': 3.402823e+38}}¶

classmethod name()[source]¶

class jubakit.dumb.Stat(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

CONFIG = {u'window_size': 128}¶

classmethod name()[source]¶

jubakit.logger module¶

jubakit.logger.get_logger(name=None)[source]¶

Returns the logger. If name is specified, child logger is returned. Otherwise the default jubakit logger is returned.

This is mainly expected for internal uses but users can get logger to print their own logs.

jubakit.logger.setup_logger(level=30, f=<open file '<stderr>', mode 'w'>, log_format=u'[%(name)s] %(asctime)s: (%(levelname)s) %(message)s')[source]¶: Convenient method to setup the logger.

jubakit.model module¶

This module provides features to manipulate model files.

class jubakit.model.AnomalyTransformer(_m)[source]¶

Bases: jubakit.model.GenericTransformer

transform(service)[source]¶

class jubakit.model.BaseTransformer(_m)[source]¶

Bases: object

__init__(_m)[source]¶

transform(service)[source]¶: Transforms the model into the specified service.

class jubakit.model.ClassifierTransformer(_m)[source]¶

Bases: jubakit.model.GenericTransformer

transform(service)[source]¶

class jubakit.model.ClusteringTransformer(_m)[source]¶: Bases: jubakit.model.GenericTransformer

class jubakit.model.GenericTransformer(_m)[source]¶

Bases: jubakit.model.BaseTransformer

Transformation for services having generic 2-element model data structure (service model and weight manager model). It can be converted to Weight model.

transform(service)[source]¶

exception jubakit.model.InvalidModelFormatError[source]¶: Bases: exceptions.Exception

class jubakit.model.JubaDump[source]¶

Bases: object

JubaDump provides a high-level dump of Jubatus models. jubadump command must be installed.

classmethod dump(data)[source]¶: Returns the dumped model data structure of the raw model data.

classmethod dump_file(target)[source]¶: Returns the dumped model data structure of the model file path target.

class jubakit.model.JubaModel[source]¶

Bases: object

JubaModel provides features to perform low-level manipulation of Jubatus model data structure.

class Container[source]¶

Bases: jubakit.model.ModelPart

dump(f)[source]¶

classmethod load(f)[source]¶

class Header[source]¶

Bases: jubakit.model.ModelPart

dump(f, checksum=True)[source]¶

classmethod fields()[source]¶

classmethod load(f)[source]¶

class ModelPart[source]¶

Bases: object

__init__()[source]¶

dump(f, *args, **kwargs)[source]¶

dumps(*args, **kwargs)[source]¶

classmethod fields()[source]¶: Returns the list of (property_name, data_type, default_value).

get()[source]¶

classmethod load(f, *args, **kwargs)[source]¶

classmethod loads(data, *args, **kwargs)[source]¶

set(record)[source]¶

class SystemContainer[source]¶

Bases: jubakit.model.Container

classmethod fields()[source]¶

class UserContainer[source]¶

Bases: jubakit.model.Container

classmethod fields()[source]¶

__init__()[source]¶

data()[source]¶: Returns the actual model data part. This method is a quick shortcut for return self.user.user_data.

dump_binary(f)[source]¶: Dumps the model as Jubatus binary model file to binary stream f.

dump_json(f, without_raw=False)[source]¶: Dumps the model as JSON file to a text stream f.

dump_text(f)[source]¶: Dumps the model as human-readable text format to a text stream f.

fix_header()[source]¶: Repairs the header values.

classmethod load_binary(f, validate=True)[source]¶: Loads Jubatus binary model file from binary stream f. When validate is True, the model file format is strictly validated.

classmethod load_json(f)[source]¶: Loads model file saved as JSON file from text stream f.

classmethod predict_format(filename)[source]¶: Loads the model file named filename. Returns binary or json.

transform(service)[source]¶

exception jubakit.model.JubaModelError(msg, e=None)[source]¶

Bases: exceptions.Exception

__init__(msg, e=None)[source]¶

class jubakit.model.RecommenderTransformer(_m)[source]¶

Bases: jubakit.model.GenericTransformer

transform(service)[source]¶

class jubakit.model.RegressionTransformer(_m)[source]¶: Bases: jubakit.model.ClassifierTransformer

exception jubakit.model.UnsupportedTransformationError(service)[source]¶

Bases: exceptions.Exception

__init__(service)[source]¶

jubakit.nearest_neighbor module¶

class jubakit.nearest_neighbor.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configuration to run Nearest Neighbor service.

classmethod methods()[source]¶

class jubakit.nearest_neighbor.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Nearest Neighbor service.

classmethod from_array(data, ids=None, feature_names=None, static=True)[source]¶

Converts two arrays (data and its associated targets) to Dataset.

data : array of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_data(data, ids=None, feature_names=None, static=True)[source]¶

Converts two arrays or a sparse matrix data and its associated id array to Dataset.

data : array or scipy 2-D sparse matrix of shape: [n_samples, n_features]

ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_matrix(data, ids=None, feature_names=None, static=True)[source]¶

Converts a sparse matrix data and its associated target array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] ids : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

get_ids()[source]¶: Returns labels of each record in the dataset.

class jubakit.nearest_neighbor.NearestNeighbor(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Nearest Neighbor service.

get_all_rows()[source]¶: Returns the list of all row IDs.

classmethod name()[source]¶

neighbor_row_from_datum(dataset, size=10)[source]¶: Returns size rows (at maximum) of which datum are most similar to query and their distance values.

neighbor_row_from_id(dataset, size=10)[source]¶: Returns size rows (at maximum) that have most similar datum to id and their distance values.

set_row(dataset)[source]¶: Updates the row whose id is id with given row. If the row with the same id already exists, the row is overwritten with row (note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.

similar_row_from_datum(dataset, size=10)[source]¶: Returns ret_num rows (at maximum) of which datum are most similar to query and their similarity values.

similar_row_from_id(dataset, size=10)[source]¶: Returns ret_num rows (at maximum) that have most similar datum to id and their similarity values.

class jubakit.nearest_neighbor.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Nearest Neighbor service.

ID = u'i'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶

Nearest Neighbor schema transforms the row into Datum, its associated ID.

If row_id does not be set, assign uuid as row_id.

jubakit.recommender module¶

class jubakit.recommender.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configuration to run Recommender service.

classmethod methods()[source]¶

class jubakit.recommender.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Recommender service.

class jubakit.recommender.Recommender(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Recommender service.

clear_row(dataset)[source]¶: Removes the given rows from the recommendation table.

complete_row_from_datum(dataset)[source]¶: Returns data points from the datum in the recommender model, with missing value completed by predicted value.

complete_row_from_id(dataset)[source]¶: Returns data points from the row id in the recommender model, with missing value completed by predicted value.

decode_row(dataset)[source]¶: Returns data points in the row id.

classmethod name()[source]¶

similar_row_from_datum(dataset, size=10)[source]¶: Returns similar data points from the datum in the recommender model.

similar_row_from_datum_and_rate(dataset, rate=0.1)[source]¶

Returns the top rate of all the rows which are most similar to row. For example, return the top 10% of all the rows when 0.1 is specified as rate.

The rate must be in (0, 1].

similar_row_from_datum_and_score(dataset, score=0.8)[source]¶: Returns rows which are most similar to row and have a greater similarity score than score.

similar_row_from_id(dataset, size=10)[source]¶: Returns similar data points from the row id in the recommender model.

similar_row_from_id_and_rate(dataset, rate=0.1)[source]¶

Returns the top rate of all the rows which are most similar to the row id. For example, return the top 10% of all the rows when 0.1 is specified as rate.

The rate must be in (0, 1].

similar_row_from_id_and_score(dataset, score=0.8)[source]¶: Returns rows which are most similar to the row id and have a greater similarity score than score.

update_row(dataset)[source]¶: Update data points to the recommender model using the given dataset.

class jubakit.recommender.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Recommender service.

ID = u'i'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶: Recommender schema transforms the row into Datum, its associated ID.

jubakit.regression module¶

class jubakit.regression.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configulation to run Classifier service.

classmethod methods()[source]¶

class jubakit.regression.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Regression service.

classmethod from_array(data, targets=None, feature_names=None, static=True)[source]¶

Converts two arrays (data and its associated targets) to Dataset.

data : array of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_data(data, targets=None, feature_names=None, static=True)[source]¶

Converts two arrays or a sparse matrix data and its associated target array to Dataset.

data : array or scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

classmethod from_matrix(data, targets=None, feature_names=None, static=True)[source]¶

Converts a sparse matrix data and its associated target array to Dataset.

data : scipy 2-D sparse matrix of shape [n_samples, n_features] targets : array of shape [n_samples], optional feature_names : array of shape [n_features], optional

class jubakit.regression.Regression(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Regression service.

estimate(dataset)[source]¶: Estimate target values of the given dataset using this Regression.

classmethod name()[source]¶

train(dataset)[source]¶: Trains the regression using the given dataset.

classmethod train_and_estimate(config, train_dataset, test_dataset, metric)[source]¶: This is an utility method to perform bulk train-test. Run a regression using the given config, train the regression, estimate using the regression, then return the calculated metrics.

class jubakit.regression.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Regression service.

TARGET = u't'¶

__init__(mapping, fallback=None)[source]¶

transform(row)[source]¶: Regression schema transforms the row into Datum and its associated target value.

jubakit.shell module¶

class jubakit.shell.JubaShell(host, port, cluster, service, **kwargs)[source]¶

Bases: object

JubaShell provides a shell environment to call Jubatus RPC API.

The interactive interface is provided in cli submodule.

__init__(host, port, cluster, service, **kwargs)[source]¶

Creates a new shell environment using parameters specified.

If service is None, it will be automatically probed.

connect()[source]¶: Discard the current connection (if connected) and create new client instance. Note that TCP connection will not be established until RPC method is called.

disconnect()[source]¶: Disconnects from the server (if connected).

classmethod get_cli_classes()[source]¶: Returns map of service name to CLI implementation class.

get_client()[source]¶: Returns the client instance.

classmethod get_client_classes()[source]¶: Returns map of service name to Jubatus client instance.

get_timeout()[source]¶: Returns the current client-side timeout value.

interact()[source]¶: Starts the interactive shell environment.

is_connected()[source]¶: Returns True if the client exists. Note that its backend TCP connection may already be closed.

classmethod probe_facts(host, port, cluster)[source]¶: Probe the service name and remote server type. Returns tuple of (service_name, is_proxy).

run(command)[source]¶: Runs one-shot command.

set_remote(host, port, cluster, service)[source]¶: Switches to the new remote server.

set_timeout(timeout)[source]¶: Sets new client-side timeout value. Existing connection will be discarded.

exception jubakit.shell.JubaShellAssertionError[source]¶: Bases: jubakit.shell.JubaShellException

exception jubakit.shell.JubaShellException[source]¶: Bases: exceptions.Exception

exception jubakit.shell.JubaShellRPCError(msg, host, port, e=None)[source]¶

Bases: jubakit.shell.JubaShellException

__init__(msg, host, port, e=None)[source]¶

jubakit.weight module¶

class jubakit.weight.Config(method=None, parameter=None, converter=None)[source]¶

Bases: jubakit.base.GenericConfig

Configuration to run Weight service.

classmethod methods()[source]¶

class jubakit.weight.Dataset(loader, schema=None, static=None, _data=None)[source]¶

Bases: jubakit.base.BaseDataset

Dataset for Weight service.

class jubakit.weight.Schema(mapping, fallback=None)[source]¶

Bases: jubakit.base.GenericSchema

Schema for Weight service.

class jubakit.weight.Weight(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]¶

Bases: jubakit.base.BaseService

Weight service.

calc_weight(dataset)[source]¶: Returns extracted feature vectors, without modifying the weight model.

classmethod name()[source]¶

update(dataset)[source]¶: Updates the weight using the given dataset and returns extracted feature vectors.