jubakit package

jubakit.anomaly module

jubakit.base module

class jubakit.base.BaseConfig(*args, **kwargs)[source]

Bases: dict

Config is a convenient class to build new config.

__init__(*args, **kwargs)[source]

Creates a new Config with default configuration.

classmethod default()[source]

Returns a new default configuration.

class jubakit.base.BaseDataset(loader, schema=None, static=None, _data=None)[source]

Bases: object

Dataset is an abstract representation of set of data.

__init__(loader, schema=None, static=None, _data=None)[source]

Defines a new dataset. Datasets are immutable and cannot be modified.

Data will be loaded from the given loader using schema.

When static is set to True (which is the default for non-infinite loaders), data will be loaded on memory immedeately; otherwise data will be loaded one-by-one from loader, which may be better when processing a large dataset. For “infinite” loaders (like MQ and Twitter stream), static cannot be set to True. Note that some features (e.g., index access) are not available for non-static datasets, which may be needed for some features like cross-validation etc.

convert(func)[source]

Applies the given callable (which is expected to perform batch pre-processing like shuffle) to the whole data entries and returns a new immutable Dataset.

get(idx)[source]

Returns the raw entry loaded by Loader.

get_schema()[source]

Returns the Schema for this dataset.

is_static()[source]

Returns True for static datasets.

shuffle(seed=None)[source]

Returns a new immutable Dataset whose records are shuffled.

class jubakit.base.BaseLoader[source]

Bases: object

Loader loads rows from various data sources.

is_infinite()[source]

Returns True if the length of the data source is indeterminate (e.g., MQ.)

preprocess(ent)[source]

Preprocesses the given dict-like object into another dict-like object. The default implementation does not alter the object. Users can override this method to perform custom process. You can yield None to skip the record.

rows()[source]

Subclasses must override this method and yield each row of data source in flat dict-like object. You can yield None to skip the record.

class jubakit.base.BaseSchema(mapping, fallback=None)[source]

Bases: object

Schema defines data types for each key of the data.

BaseSchema defines the fundamental 3 data types.

  • IGNORE: ignores the key (mainly intended for fallback)

  • AUTO: use the type of the key as its data type

  • INFER: guess the type of the key from its value; note that this is

    discouraged as it may result in unstable result.

AUTO = u'.'
IGNORE = u'_'
INFER = u'?'
__init__(mapping, fallback=None)[source]

Defines a Schema. Schema is an immutable object and cannot be modified. mapping is a dict-like object that maps row keys to the data type. Optionally you can assign an alias name for the key to handle different loaders with the same configuration.

classmethod predict(row, typed)[source]

Predicts a Schema from dict-like row object.

transform(row)[source]

Transform the row (dict-like) into data structures required by the corresponding Service.

class jubakit.base.BaseService(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]

Bases: object

Service provides an interface to machine learning features.

__init__(host=u'127.0.0.1', port=9199, cluster=u'', timeout=0)[source]

Creates a new service that connects to the exsiting server.

clear()[source]

Clears the model.

get_status()[source]

Returns the status of this server. In distributed mode, returns statuses of all members.

load(name, path=None)[source]

Loads the model using name. If path is specified, copy the model file from local path to remote location.

classmethod name()[source]

Subclasses (Classifier, NearestNeighbor, ... etc.) must override this method and return its service name (classifier, nearest_neighbor, ... etc.)

classmethod run(config, port=None, embedded=False)[source]

Runs a new standalone server or embedded instance and returns the service instance.

save(name, path=None)[source]

Saves the model using name. If path is specified, copy the saved model file to local path.

shell(**kwargs)[source]

Starts an interactive shell session for this service.

stop()[source]

Stops the backend process if exists.

class jubakit.base.GenericConfig(method=None, parameter=None, converter=None)[source]

Bases: jubakit.base.BaseConfig

GenericConfig is a base Config class for generic services that have converter, method and parameter in its config data.

__init__(method=None, parameter=None, converter=None)[source]
add_mecab(name=u'mecab', arg=u'', ngram=1, base=False, include_features=u'*', exclude_features=u'')[source]

Add MeCab feature extraction to string_types.

clear_converter()[source]

Initialize the converter section of the config with an empty template.

classmethod methods()[source]

Subclasses must override this method and return methods available for this service.

class jubakit.base.GenericSchema(mapping, fallback=None)[source]

Bases: jubakit.base.BaseSchema

GenericSchema is a base Schema class for all engines using Datum.

GenericSchema defines 3 data types:

  • STRING: string features (string_values)
  • NUMBER: numeric features (num_values)
  • BINARY: binary features (binary_values)
BINARY = u'b'
NUMBER = u'n'
STRING = u's'
classmethod predict(row, typed)[source]

Predicts a schema from dict-like row object.

transform(row)[source]

Transforms the row (represented in dict-like object) as Datum. Subclasses that define their own data types should override this method and handle them.

class jubakit.base.Utils[source]

Bases: object

static softmax(x)[source]

jubakit.classifier module

jubakit.compat module

jubakit.dumb module

jubakit.logger module

jubakit.logger.get_logger(name=None)[source]

Returns the logger. If name is specified, child logger is returned. Otherwise the default jubakit logger is returned.

This is mainly expected for internal uses but users can get logger to print their own logs.

jubakit.logger.setup_logger(level=30, f=<open file '<stderr>', mode 'w'>, log_format=u'[%(name)s] %(asctime)s: (%(levelname)s) %(message)s')[source]

Convenient method to setup the logger.

jubakit.model module

This module provides features to manipulate model files.

exception jubakit.model.InvalidModelFormatError[source]

Bases: exceptions.Exception

class jubakit.model.JubaDump[source]

Bases: object

JubaDump provides a high-level dump of Jubatus models. jubadump command must be installed.

classmethod dump(data)[source]

Returns the dumped model data structure of the raw model data.

classmethod dump_file(target)[source]

Returns the dumped model data structure of the model file path target.

class jubakit.model.JubaModel[source]

Bases: object

JubaModel provides features to perform low-level manipulation of Jubatus model data structure.

class Container[source]

Bases: jubakit.model.ModelPart

dump(f)[source]
classmethod load(f)[source]
class JubaModel.Header[source]

Bases: jubakit.model.ModelPart

dump(f, checksum=True)[source]
classmethod fields()[source]
classmethod load(f)[source]
class JubaModel.ModelPart[source]

Bases: object

__init__()[source]
dump(f, *args, **kwargs)[source]
dumps(*args, **kwargs)[source]
classmethod fields()[source]

Returns the list of (property_name, data_type, default_value).

get()[source]
classmethod load(f, *args, **kwargs)[source]
classmethod loads(data, *args, **kwargs)[source]
set(record)[source]
class JubaModel.SystemContainer[source]

Bases: jubakit.model.Container

classmethod fields()[source]
class JubaModel.UserContainer[source]

Bases: jubakit.model.Container

classmethod fields()[source]
JubaModel.__init__()[source]
JubaModel.data()[source]

Returns the actual model data part. This method is a quick shortcut for return self.user.user_data.

JubaModel.dump_binary(f)[source]

Dumps the model as Jubatus binary model file to binary stream f.

JubaModel.dump_json(f, without_raw=False)[source]

Dumps the model as JSON file to a text stream f.

JubaModel.dump_text(f)[source]

Dumps the model as human-readable text format to a text stream f.

JubaModel.fix_header()[source]

Repairs the header values.

classmethod JubaModel.load_binary(f, validate=True)[source]

Loads Jubatus binary model file from binary stream f. When validate is True, the model file format is strictly validated.

classmethod JubaModel.load_json(f)[source]

Loads model file saved as JSON file from text stream f.

classmethod JubaModel.predict_format(filename)[source]

Loads the model file named filename. Returns binary or json.

exception jubakit.model.JubaModelError(msg, e=None)[source]

Bases: exceptions.Exception

__init__(msg, e=None)[source]

jubakit.recommender module

jubakit.regression module

jubakit.shell module

class jubakit.shell.JubaShell(host, port, cluster, service, **kwargs)[source]

Bases: object

JubaShell provides a shell environment to call Jubatus RPC API.

The interactive interface is provided in cli submodule.

__init__(host, port, cluster, service, **kwargs)[source]

Creates a new shell environment using parameters specified.

If service is None, it will be automatically probed.

connect()[source]

Discard the current connection (if connected) and create new client instance. Note that TCP connection will not be established until RPC method is called.

disconnect()[source]

Disconnects from the server (if connected).

classmethod get_cli_classes()[source]

Returns map of service name to CLI implementation class.

get_client()[source]

Returns the client instance.

classmethod get_client_classes()[source]

Returns map of service name to Jubatus client instance.

get_timeout()[source]

Returns the current client-side timeout value.

interact()[source]

Starts the interactive shell environment.

is_connected()[source]

Returns True if the client exists. Note that its backend TCP connection may already be closed.

classmethod probe_facts(host, port, cluster)[source]

Probe the service name and remote server type. Returns tuple of (service_name, is_proxy).

run(command)[source]

Runs one-shot command.

set_remote(host, port, cluster, service)[source]

Switches to the new remote server.

set_timeout(timeout)[source]

Sets new client-side timeout value. Existing connection will be discarded.

exception jubakit.shell.JubaShellAssertionError[source]

Bases: jubakit.shell.JubaShellException

exception jubakit.shell.JubaShellException[source]

Bases: exceptions.Exception

exception jubakit.shell.JubaShellRPCError(msg, host, port, e=None)[source]

Bases: jubakit.shell.JubaShellException

__init__(msg, host, port, e=None)[source]

jubakit.weight module