jubakit.loader package

jubakit.loader.array module

class jubakit.loader.array.ArrayLoader(array, feature_names=None)[source]

Bases: jubakit.base.BaseLoader

ArrayLoader is a loader to read from 2-d array. Expected to load row-oriented data.

For example:

>>> ArrayLoader([[1,2,3], [4,5,6]], ['k1','k2','k3'])

... will load two entries:

  • {‘k1’: 1, ‘k2’: 2, ‘k3’: 3}
  • {‘k1’: 4, ‘k2’: 5, ‘k3’: 6}
__init__(array, feature_names=None)[source]
rows()[source]
class jubakit.loader.array.ZipArrayLoader(arrays=[], feature_names=None, **named_arrays)[source]

Bases: jubakit.base.BaseLoader

ZipArrayLoader zips multiple 1-d arrays that have the same length. Expected to load column-oriented data.

For example:

>>> ZipArrayLoader([[1,4], [2,5], [3,6]], ['k1','k2','k3'])

... or simply:

>>> ZipArrayLoader(k1=[1,4], k2=[2,5], k3=[3,6])

... will load two entries:

  • {‘k1’: 1, ‘k2’: 2, ‘k3’: 3}
  • {‘k1’: 4, ‘k2’: 5, ‘k3’: 6}
__init__(arrays=[], feature_names=None, **named_arrays)[source]
rows()[source]

jubakit.loader.chain module

class jubakit.loader.chain.ConcatLoader(*loaders)[source]

Bases: jubakit.base.BaseLoader

ConcatLoader is a loader to concat multiple loaders.

__init__(*loaders)[source]
rows()[source]
class jubakit.loader.chain.MergeChainLoader(*loaders)[source]

Bases: jubakit.base.BaseLoader

MergeChainLoader merges multiple loaders.

__init__(*loaders)[source]
rows()[source]
class jubakit.loader.chain.ValueMapChainLoader(loader, key, mapping)[source]

Bases: jubakit.base.BaseLoader

ValueMapChainLoader is a loader to map value of the specified key in each record loaded from another loader.

__init__(loader, key, mapping)[source]
rows()[source]

jubakit.loader.core module

class jubakit.loader.core.LineBasedFileLoader(filename, *args, **kwargs)[source]

Bases: jubakit.loader.core.LineBasedStreamLoader

Loader to process line-oriented text file.

__init__(filename, *args, **kwargs)[source]
class jubakit.loader.core.LineBasedStreamLoader(f, close=False)[source]

Bases: jubakit.base.BaseLoader

Loader to process line-oriented text stream. You can override preprocess method to separate the row into fields.

__init__(f, close=False)[source]
rows()[source]

jubakit.loader.csv module

class jubakit.loader.csv.CSVLoader(filename, fieldnames=None, encoding=u'utf-8', *args, **kwargs)[source]

Bases: jubakit.base.BaseLoader

Loader to process CSV files.

__init__(filename, fieldnames=None, encoding=u'utf-8', *args, **kwargs)[source]

Creates a new loader that processes CSV files.

You can optionally give fieldnames option. If fieldnames is not specified (which is a default) or specifeid as True, the first line of the CSV is used for column names. If fieldnames is specified as False, sequential column names are automatically generated like [‘c0’, ‘c1’, ...]. If fieldnames is a list, it is used as column names.

Any other optional or keyword arguments are passed to the underlying csv.DictReader.

>>> loader = CSVLoader('dataset.tsv', fieldnames=False, encoding='cp932', delimiter='       ')
rows()[source]

jubakit.loader.odbc module

class jubakit.loader.odbc.ODBCLoader[source]

Bases: jubakit.base.BaseLoader

Loader to process ODBC data sources.

jubakit.loader.sparse module

class jubakit.loader.sparse.SparseMatrixLoader(matrix, feature_names=None)[source]

Bases: jubakit.base.BaseLoader

SparseMatrixLoader is a loader to read from scipy.sparse 2-d matrix. Zero entries are ignored.

__init__(matrix, feature_names=None)[source]
rows()[source]

jubakit.loader.twitter module

class jubakit.loader.twitter.TwitterOAuthHandler(**kwargs)[source]

Bases: object

Handles authentication required to access Twitter Streaming API.

__init__(**kwargs)[source]

Authentication information must be specified as follows:

>>> TwitterOAuth(
...   consumer_key='XXXXXXXXXXXXXXXXXXXX',
...   consumer_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
...   access_token='XXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
...   access_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
... )

If some of keys are not specified, environmenet variables (TWITTER_CONSUMER_KEY etc.) will automatically be used.

You can get your key by registering your app on: https://apps.twitter.com/

get()[source]
class jubakit.loader.twitter.TwitterStreamLoader(auth=None, mode=u'sample', keys=[u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count'], count=None, **kwargs)[source]

Bases: jubakit.base.BaseLoader

Loader to process Twitter Stream. Loads statuses only; other type of messages such as direct messages and warnings are just ignored.

tweepy and jq package must be installed to use this loader.

FILTER = u'filter'
FIREHOSE = u'firehose'
SAMPLE = u'sample'
SITE = u'site'
STATUS_KEYS = [u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count']
USER = u'user'
__init__(auth=None, mode=u'sample', keys=[u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count'], count=None, **kwargs)[source]
is_infinite()[source]
rows()[source]