jubakit.loader package

jubakit.loader.array module

class jubakit.loader.array.ArrayLoader(array, feature_names=None)[ソース]

ベースクラス: jubakit.base.BaseLoader

ArrayLoader is a loader to read from 2-d array. Expected to load row-oriented data.

For example:

>>> ArrayLoader([[1,2,3], [4,5,6]], ['k1','k2','k3'])

… will load two entries:

  • {『k1』: 1, 『k2』: 2, 『k3』: 3}
  • {『k1』: 4, 『k2』: 5, 『k3』: 6}
__init__(array, feature_names=None)[ソース]
rows()[ソース]
class jubakit.loader.array.ZipArrayLoader(arrays=[], feature_names=None, **named_arrays)[ソース]

ベースクラス: jubakit.base.BaseLoader

ZipArrayLoader zips multiple 1-d arrays that have the same length. Expected to load column-oriented data.

For example:

>>> ZipArrayLoader([[1,4], [2,5], [3,6]], ['k1','k2','k3'])

… or simply:

>>> ZipArrayLoader(k1=[1,4], k2=[2,5], k3=[3,6])

… will load two entries:

  • {『k1』: 1, 『k2』: 2, 『k3』: 3}
  • {『k1』: 4, 『k2』: 5, 『k3』: 6}
__init__(arrays=[], feature_names=None, **named_arrays)[ソース]
rows()[ソース]

jubakit.loader.chain module

class jubakit.loader.chain.ConcatLoader(*loaders)[ソース]

ベースクラス: jubakit.base.BaseLoader

ConcatLoader is a loader to concat multiple loaders.

__init__(*loaders)[ソース]
rows()[ソース]
class jubakit.loader.chain.MergeChainLoader(*loaders)[ソース]

ベースクラス: jubakit.base.BaseLoader

MergeChainLoader merges multiple loaders.

__init__(*loaders)[ソース]
rows()[ソース]
class jubakit.loader.chain.ValueMapChainLoader(loader, key, mapping)[ソース]

ベースクラス: jubakit.base.BaseLoader

ValueMapChainLoader is a loader to map value of the specified key in each record loaded from another loader.

__init__(loader, key, mapping)[ソース]
rows()[ソース]

jubakit.loader.core module

class jubakit.loader.core.LineBasedFileLoader(filename, *args, **kwargs)[ソース]

ベースクラス: jubakit.loader.core.LineBasedStreamLoader

Loader to process line-oriented text file.

__init__(filename, *args, **kwargs)[ソース]
class jubakit.loader.core.LineBasedStreamLoader(f, close=False)[ソース]

ベースクラス: jubakit.base.BaseLoader

Loader to process line-oriented text stream. You can override preprocess method to separate the row into fields.

__init__(f, close=False)[ソース]
rows()[ソース]

jubakit.loader.csv module

class jubakit.loader.csv.CSVLoader(filename, fieldnames=None, encoding=u'utf-8', *args, **kwargs)[ソース]

ベースクラス: jubakit.base.BaseLoader

Loader to process CSV files.

__init__(filename, fieldnames=None, encoding=u'utf-8', *args, **kwargs)[ソース]

Creates a new loader that processes CSV files.

You can optionally give fieldnames option. If fieldnames is not specified (which is a default) or specifeid as True, the first line of the CSV is used for column names. If fieldnames is specified as False, sequential column names are automatically generated like [『c0』, 『c1』, …]. If fieldnames is a list, it is used as column names.

Any other optional or keyword arguments are passed to the underlying csv.DictReader.

>>> loader = CSVLoader('dataset.tsv', fieldnames=False, encoding='cp932', delimiter='       ')
rows()[ソース]

jubakit.loader.odbc module

class jubakit.loader.odbc.ODBCLoader[ソース]

ベースクラス: jubakit.base.BaseLoader

Loader to process ODBC data sources.

jubakit.loader.postgresql module

class jubakit.loader.postgresql.PostgreSQLAuthHandler(**kwargs)[ソース]

ベースクラス: object

Handles authentication required to access PostgreSQL.

__init__(**kwargs)[ソース]

Authentication information must be specified as follows:

>>> PostgreSQLAuth(
...   user='XXXXXXXXXXXXXXXXXXXX',
...   password='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
...   host='XXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
...   port='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
... )

Any other connection parameter supported by this loader can be passed as a keyword. The complete list of the supported parameters are contained the PostgreSQL documentation. (https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS)

get()[ソース]
class jubakit.loader.postgresql.PostgreSQLoader(auth, table, **kwargs)[ソース]

ベースクラス: jubakit.base.BaseLoader

Loader to process columns of PostgreSQL.

This loader that load data from PostgreSQL’s table as below. We access the 「test」 table of the 「test」 database in the below example.

Example:

from jubakit.loader.postgresql import PostgreSQLoader from jubakit.loader.postgresql import PostgreSQLAuthHandler

auth = PostgreSQLAuthHandler(dbname=』test』, user=』postgres』, password=』postgres』, host=』localhost』, port=『5432』)

loader = PostgreSQLoader(auth, table=』test』) for row in loader:

print(row)

# {『id』: 1, 『num』: 100, 『data』: 『abcdef』} # {『id』: 2, 『num』: 200, 『data』: 『ghijkl』} # {『id』: 3, 『num』: 300, 『data』: 『mnopqr』}

__init__(auth, table, **kwargs)[ソース]
rows()[ソース]

jubakit.loader.sparse module

class jubakit.loader.sparse.SparseMatrixLoader(matrix, feature_names=None)[ソース]

ベースクラス: jubakit.base.BaseLoader

SparseMatrixLoader is a loader to read from scipy.sparse 2-d matrix. Zero entries are ignored.

__init__(matrix, feature_names=None)[ソース]
rows()[ソース]

jubakit.loader.twitter module

class jubakit.loader.twitter.TwitterOAuthHandler(**kwargs)[ソース]

ベースクラス: object

Handles authentication required to access Twitter Streaming API.

__init__(**kwargs)[ソース]

Authentication information must be specified as follows:

>>> TwitterOAuth(
...   consumer_key='XXXXXXXXXXXXXXXXXXXX',
...   consumer_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
...   access_token='XXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
...   access_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
... )

If some of keys are not specified, environmenet variables (TWITTER_CONSUMER_KEY etc.) will automatically be used.

You can get your key by registering your app on: https://apps.twitter.com/

get()[ソース]
class jubakit.loader.twitter.TwitterStreamLoader(auth=None, mode=u'sample', keys=[u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count'], count=None, **kwargs)[ソース]

ベースクラス: jubakit.base.BaseLoader

Loader to process Twitter Stream. Loads statuses only; other type of messages such as direct messages and warnings are just ignored.

tweepy and jq package must be installed to use this loader.

FILTER = u'filter'
FIREHOSE = u'firehose'
SAMPLE = u'sample'
SITE = u'site'
STATUS_KEYS = [u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count']
USER = u'user'
__init__(auth=None, mode=u'sample', keys=[u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count'], count=None, **kwargs)[ソース]
is_infinite()[ソース]
rows()[ソース]