jubakit.loader package¶
jubakit.loader.array module¶
-
class
jubakit.loader.array.
ArrayLoader
(array, feature_names=None)[source]¶ Bases:
jubakit.base.BaseLoader
ArrayLoader is a loader to read from 2-d array. Expected to load row-oriented data.
For example:
>>> ArrayLoader([[1,2,3], [4,5,6]], ['k1','k2','k3'])
… will load two entries:
- {‘k1’: 1, ‘k2’: 2, ‘k3’: 3}
- {‘k1’: 4, ‘k2’: 5, ‘k3’: 6}
-
class
jubakit.loader.array.
ZipArrayLoader
(arrays=[], feature_names=None, **named_arrays)[source]¶ Bases:
jubakit.base.BaseLoader
ZipArrayLoader zips multiple 1-d arrays that have the same length. Expected to load column-oriented data.
For example:
>>> ZipArrayLoader([[1,4], [2,5], [3,6]], ['k1','k2','k3'])
… or simply:
>>> ZipArrayLoader(k1=[1,4], k2=[2,5], k3=[3,6])
… will load two entries:
- {‘k1’: 1, ‘k2’: 2, ‘k3’: 3}
- {‘k1’: 4, ‘k2’: 5, ‘k3’: 6}
jubakit.loader.chain module¶
-
class
jubakit.loader.chain.
ConcatLoader
(*loaders)[source]¶ Bases:
jubakit.base.BaseLoader
ConcatLoader is a loader to concat multiple loaders.
-
class
jubakit.loader.chain.
MergeChainLoader
(*loaders)[source]¶ Bases:
jubakit.base.BaseLoader
MergeChainLoader merges multiple loaders.
jubakit.loader.core module¶
-
class
jubakit.loader.core.
LineBasedFileLoader
(filename, *args, **kwargs)[source]¶ Bases:
jubakit.loader.core.LineBasedStreamLoader
Loader to process line-oriented text file.
jubakit.loader.csv module¶
-
class
jubakit.loader.csv.
CSVLoader
(filename, fieldnames=None, encoding=u'utf-8', *args, **kwargs)[source]¶ Bases:
jubakit.base.BaseLoader
Loader to process CSV files.
-
__init__
(filename, fieldnames=None, encoding=u'utf-8', *args, **kwargs)[source]¶ Creates a new loader that processes CSV files.
You can optionally give fieldnames option. If fieldnames is not specified (which is a default) or specifeid as True, the first line of the CSV is used for column names. If fieldnames is specified as False, sequential column names are automatically generated like [‘c0’, ‘c1’, …]. If fieldnames is a list, it is used as column names.
Any other optional or keyword arguments are passed to the underlying csv.DictReader.
>>> loader = CSVLoader('dataset.tsv', fieldnames=False, encoding='cp932', delimiter=' ')
-
jubakit.loader.odbc module¶
-
class
jubakit.loader.odbc.
ODBCLoader
[source]¶ Bases:
jubakit.base.BaseLoader
Loader to process ODBC data sources.
jubakit.loader.postgresql module¶
-
class
jubakit.loader.postgresql.
PostgreSQLAuthHandler
(**kwargs)[source]¶ Bases:
object
Handles authentication required to access PostgreSQL.
-
__init__
(**kwargs)[source]¶ Authentication information must be specified as follows:
>>> PostgreSQLAuth( ... user='XXXXXXXXXXXXXXXXXXXX', ... password='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ... host='XXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ... port='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ... )
Any other connection parameter supported by this loader can be passed as a keyword. The complete list of the supported parameters are contained the PostgreSQL documentation. (https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS)
-
-
class
jubakit.loader.postgresql.
PostgreSQLoader
(auth, table, **kwargs)[source]¶ Bases:
jubakit.base.BaseLoader
Loader to process columns of PostgreSQL.
This loader that load data from PostgreSQL’s table as below. We access the “test” table of the “test” database in the below example.
- Example:
from jubakit.loader.postgresql import PostgreSQLoader from jubakit.loader.postgresql import PostgreSQLAuthHandler
auth = PostgreSQLAuthHandler(dbname=’test’, user=’postgres’, password=’postgres’, host=’localhost’, port=‘5432’)
loader = PostgreSQLoader(auth, table=’test’) for row in loader:
print(row)# {‘id’: 1, ‘num’: 100, ‘data’: ‘abcdef’} # {‘id’: 2, ‘num’: 200, ‘data’: ‘ghijkl’} # {‘id’: 3, ‘num’: 300, ‘data’: ‘mnopqr’}
jubakit.loader.sparse module¶
jubakit.loader.twitter module¶
-
class
jubakit.loader.twitter.
TwitterOAuthHandler
(**kwargs)[source]¶ Bases:
object
Handles authentication required to access Twitter Streaming API.
-
__init__
(**kwargs)[source]¶ Authentication information must be specified as follows:
>>> TwitterOAuth( ... consumer_key='XXXXXXXXXXXXXXXXXXXX', ... consumer_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ... access_token='XXXXXXXX-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ... access_secret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ... )
If some of keys are not specified, environmenet variables (TWITTER_CONSUMER_KEY etc.) will automatically be used.
You can get your key by registering your app on: https://apps.twitter.com/
-
-
class
jubakit.loader.twitter.
TwitterStreamLoader
(auth=None, mode=u'sample', keys=[u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count'], count=None, **kwargs)[source]¶ Bases:
jubakit.base.BaseLoader
Loader to process Twitter Stream. Loads statuses only; other type of messages such as direct messages and warnings are just ignored.
tweepy
andjq
package must be installed to use this loader.-
FILTER
= u'filter'¶
-
FIREHOSE
= u'firehose'¶
-
SAMPLE
= u'sample'¶
-
SITE
= u'site'¶
-
STATUS_KEYS
= [u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count']¶
-
USER
= u'user'¶
-
__init__
(auth=None, mode=u'sample', keys=[u'.id_str', u'.text', u'.lang', u'.favorite_count', u'.retweet_count', u'.timestamp_ms', u'.user.id', u'.user.name', u'.user.screen_name', u'.user.description', u'.user.lang', u'.user.statuses_count', u'.user.friends_count', u'.user.followers_count', u'.user.favourites_count', u'.user.listed_count'], count=None, **kwargs)[source]¶
-