Distributed Mode

In this page, we describe how to setup Jubatus distributed in multiple nodes.

We recommend trying this tutorial after you experience Basic Tutorial in standalone mode.

Distributed Mode

You can run Jubatus in a distributed environment using ZooKeeper and Jubatus proxies.

Setup ZooKeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Jubatus in cluster mode uses ZooKeeper to manage Jubatus servers and proxies in cluster environment.

Run ZooKeeper server like this:

$ /path/to/zookeeper/bin/zkServer.sh start
JMX enabled by default
Using config: /path/to/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ...

Here we assume that ZooKeeper is running on localhost:2181. You can change it in the zoo.cfg file.

Register configuration file to ZooKeeper

In distributed environment, register configuration file on the local file system to ZooKeeper using jubaconfig.

$ jubaconfig --cmd write --zookeeper=localhost:2181 --file config.json --name tutorial --type classifier

Jubatus Proxy

Jubatus proxies proxy RPC requests from clients to servers. In distributed environment, make RPC requests from clients to proxies, not directly to servers.

Jubatus proxies are provided for each Jubatus servers. For the classifier, jubaclassifier_proxy is the corresponding proxy.

$ jubaclassifier_proxy --zookeeper=localhost:2181 --rpc-port=9198

Now jubaclassifier_proxy started listening on TCP port 9198 for RPC requests.

Join Jubatus Servers to Cluster

To start Jubatus servers in cluster mode, give --name and --zookeeper option when executing servers. Server processes started with same name belongs to the same cluster and they collaborate with one another.

If you want to start multiple server processes on the same machine, please note that you must change the port for each processes.

$ jubaclassifier --rpc-port=9180 --name=tutorial --zookeeper=localhost:2181 &
$ jubaclassifier --rpc-port=9181 --name=tutorial --zookeeper=localhost:2181 &
$ jubaclassifier --rpc-port=9182 --name=tutorial --zookeeper=localhost:2181 &

When Jubatus servers are started in cluster mode, they create a node in ZooKeeper system. You can verify that three server processes are registered to ZooKeeper system by using ZooKeeper client.

$ /path/to/zookeeper/bin/zkCli.sh -server localhost:2181
[zk: localhost:2181(CONNECTED) 0] ls /jubatus/actors/classifier/tutorial/nodes

Run Tutorial

Run the tutorial program again, but this time we use options to specify port to connect to proxies instead of servers. In cluster mode, you also need to specify the cluster name when making RPC request to proxies.

$ python tutorial.py --server_port=9198 --name=tutorial

Note that you can use the same client code for both standalone mode and distributed mode.

Cluster Management in Jubatus

Jubatus has a mechanism to centrally manage various processes. In this tutorial, you will execute some processes on each server as shown in the following table.

IP Address Processes Terminal jubaclassifier - 1 jubaclassifier - 2 jubaclassifier - 3 jubaclassifier_proxy/client - 1 jubaclassifier_proxy/client - 2 jubaclassifier_proxy/client - 3 ZooKeeper - 1 ZooKeeper - 2 ZooKeeper - 3

For the best practices, see Cluster Administration Guide.

ZooKeepers & Jubatus Proxies

Start ZooKeeper servers (make sure you configure an ensemble between them).

[]$ bin/zkServer.sh start
[]$ bin/zkServer.sh start
[]$ bin/zkServer.sh start

Start jubaclassifier_proxy processes. jubaclassifier_proxy uses TCP port 9199 by default.

[]$ jubaclassifier_proxy --zookeeper,,
[]$ jubaclassifier_proxy --zookeeper,,
[]$ jubaclassifier_proxy --zookeeper,,

Jubavisor: Process Management Agent

jubavisor is an agent process that manages server processes.

jubavisor can manage each Jubatus server processes by receiving RPC requests from jubactl, a controller command. jubavisor uses TCP port 9198 by default.

[]$ jubavisor --zookeeper,, --daemon
[]$ jubavisor --zookeeper,, --daemon
[]$ jubavisor --zookeeper,, --daemon

Now send commands from jubactl to jubavisor.

[]$ jubactl -c start  --server=jubaclassifier --type=classifier --name=tutorial --zookeeper=,,
[]$ jubactl -c status --server=jubaclassifier --type=classifier --name=tutorial --zookeeper=,,
active jubaclassifier_proxy members:
active jubavisor members:
active tutorial members:

From members list, you can see the server is running. Now run clients simultaneously, from multiple hosts.

[]$ python tutorial.py --name=tutorial --server_ip
[]$ python tutorial.py --name=tutorial --server_ip
[]$ python tutorial.py --name=tutorial --server_ip

You can also stop instance of Jubatus server from jubactl.

[]$ jubactl -c stop --server=jubaclassifier --type=classifier --name=tutorial --zookeeper=,,