jubakit: Jubatus Toolkit¶
jubakit is a Python module to access Jubatus features easily. jubakit can be used in conjunction with scikit-learn so that you can use powerful features like cross validation and model evaluation. See the Jubakit Documentation for the detailed description.
Currently jubakit supports Classifier, Regression, Anomaly, Recommender, NearestNeighbor, Clustering, Burst, Bandit and Weight engines.
Install¶
pip install jubakit
Requirements¶
- Python 2.7, 3.3, 3.4 or 3.5.
- Jubatus needs to be installed.
- Although not mandatory, installing scikit-learn is required to use some features like K-fold cross validation.
Quick Start¶
The following example shows how to perform train/classify using CSV dataset.
from jubakit.classifier import Classifier, Schema, Dataset, Config
from jubakit.loader.csv import CSVLoader
# Load a CSV file.
loader = CSVLoader('iris.csv')
# Define types for each column in the CSV file.
schema = Schema({
'Species': Schema.LABEL,
}, Schema.NUMBER)
# Get the shuffled dataset.
dataset = Dataset(loader, schema).shuffle()
# Run the classifier service (`jubaclassifier` process).
classifier = Classifier.run(Config())
# Train the classifier.
for _ in classifier.train(dataset): pass
# Classify using the trained classifier.
for (idx, label, result) in classifier.classify(dataset):
print("true label: {0}, estimated label: {1}".format(label, result[0][0]))
Examples by Topics¶
See the example directory for working examples.
Example | Topics | Requires scikit-learn |
---|---|---|
classifier_csv.py | Handling CSV file and numeric features | |
classifier_shogun.py | Handling CSV file and string features | |
classifier_digits.py | Handling toy dataset (digits) | ✓ |
classifier_libsvm.py | Handling LIBSVM file | ✓ |
classifier_kfold.py | K-fold cross validation and metrics | ✓ |
classifier_parameter.py | Finding best hyper parameter | ✓ |
classifier_hyperopt_tuning.py | Finding best hyper parameter using hyperopt | ✓ |
classifier_bulk.py | Bulk Train-Test Classifier | |
classifier_twitter.py | Handling Twitter Streams | |
classifier_model_extract.py | Extract contents of Classfier model file | |
classifier_sklearn_wrapper.py | Classification using scikit-learn wrapper | ✓ |
classifier_sklearn_grid_search.py | Grid Search example using scikit-learn wrapper | ✓ |
classifier_tensorboard.py | Visualize a training process using TensorBoard | ✓ |
regression_boston.py | Regression with toy dataset (boston) | ✓ |
regression_csv.py | Regression with CSV file | |
regression_sklearn_wrapper.py | Regression using scikit-learn wrapper | ✓ |
anomaly_auc.py | Anomaly detection and metrics | |
recommender_npb.py | Recommend similar items | |
nearest_neighbor_aaai.py | Search neighbor items | |
clustering_2d.py | Clustering 2-dimensional dataset | |
burst_dummy_stream.py | Burst detection with stream data | |
bandit_slot.py | Multi-armed bandit with slot machine example | |
weight_shogun.py | Tracing fv_converter behavior using Weight | |
weight_model_extract.py | Extract contents of Weight model file |
License¶
MIT License
Resources¶
Jubakit の紹介 from kmaehashi