Using Code Generators

Development of machine learning algorithms using Jubatus framework starts with writing an IDL (RPC interface definition). By using a code generator - jenerator (bundled tool with Jubatus) - you can generate each component (server, proxy, client for each language) from the IDL. By using this generator, framework users don’t need to focus on things other than implementing algorithms.

Flow of Development

  1. Define RPC interfaces that the service should have using IDL.
  2. Generate codes for server, proxy, common data structures and clients (C++/Python/Ruby/Java) with jenerator from IDL.
  3. Implement codes of interface of user-defined class and (if necessary) mix operation.

Use the skeleton project to get started.

Why We Use IDL

First, there must exist same definitions in six C++ source files every time we add new learning algorithm, that is, header and implementation of client, header and implementation of proxy and header and implementation of server. This would lead to a “hotbed” of bugs every time we had changed the API.

By using IDL, users can create system through the flow above. Currenly all algorithms (recommener, classifier, regression, stat and graph) defines its interface with jenerator.

Composition of Files

Machine learning system that uses Jubatus framework consists of the following files (where NAME is a name of the service).

  • NAME_serv.cpp: Implementation of the server (edit template generated by jenerator)
  • NAME_serv.hpp: Header file for NAME_serv.cpp (edit template generated by jenerator)
  • NAME_impl.cpp: main function, RPC interface definition for the server and register RPC methods (automatically generated by jenerator)
  • NAME_proxy.cpp: Implementation of the proxy (automatically generated by jenerator)
  • NAME_client.hpp: Implementation of the client to be used in server-to-server communication (automatically generated by jenerator)
  • NAME_types.hpp: Structures and type information (automatically generated by jenerator)

jenerator: The Code Generator

The RPC interface is defined with MessagePack-IDL. Aside from MessagePack-IDL’s original syntax, we must add annotations for each method of RPC service in order to generate Jubatus servers and proxies.

Annotations are interpreted by our code generator, called jenerator, but they are ignored as comments by MessagePack-IDL. Therefore, we can generate each client with same interface by MessagePack-IDL.

Syntax of annotations for each methods is as follows.

  • Each method must have 3 annotations, each of them starts with #@, that specifies “routing”, “lock type” and “aggregation method” in order.
  • The “routing” annotation defines how Jubatus Proxy proxy requests. Three methods (cht, broadcast or random) are available so that we can cover distribution methods used in typical machine learning tasks.
    • cht means that the request is distributed by using Consistent Hashing. Methods annotated with cht must take at least 1 argument, which is a string that is used as a key for consistent hashing. Replication level of updated data is 2 by default. You can change the replication level by specifying like #@cht(1).
    • broadcast means that the request will be broadcasted to all servers in the cluster.
    • random means that the request will be proxied to one randomly-chosen server in the cluster.
  • The “lock type” annotation defines read/write of request, and value must be one of analysis, update or nolock.
    • When using analysis, data is locked in server with read lock and is accessible by multiple thread simultaneously.
    • When using update, data is locked with write lock, so that we can safely update the data.
    • When using nolock, no locks are aquired in the server.
  • The “aggregation” annotation defines how to aggrate the results of API call from multiple servers. Available aggregators are written in aggregators.hpp.

void type cannot be used as the return type of methods. If the return value is not needed, you must add meaningless type such as int or bool.

Here is a example of MessagePack-IDL with annotation.

message entry {
  0: string key
  1: string value
  2: int version
}

service kvs {
  #@cht(2) #@update #@pass
  int put(0: string key, 1: string value)

  #@cht(2) #@analysis #@pass
  entry get(0: string key)

  #@cht(2) #@update #@pass
  int del(0: string key, 1: int version)

  #@broadcast #@update #@pass
  int clear()
}

The following RPC methods for server are automatically appended to each service by jenerator:

#@random #@analysis #@pass
string get_config()

#@broadcast #@analysis #@all_and
bool save(0: string id)

#@broadcast #@update #@all_and
bool load(0: string id)

#@broadcast #@analysis #@merge
map<string, map<string, string> > get_status()

Building jenerator

You need OCaml >=4.02.1 (with findlib), extlib and OMake and OUnit and ppx_deriving to build jenerator. We recommend to use OPAM to make OCaml environment. You have to use OPAM version 1.2 or more for installing modules which jenerator depends. When you want to install OPAM from its source, ocamlbrew is usefull.

$ opam switch 4.02.1
$ eval `opam config env`
$ opam install ocamlfind extlib omake ounit ppx_deriving
$ cd jubatus/tools/jenerator
$ omake
$ sudo omake install

omake install installs jenerator as /usr/local/bin/jenerator (path may vary depending on your environment). If you want to install jenerator to other directory, use PREFIX environment variable.

$ PREFIX=/path/to/install omake install

You can also use built jenerator binary directly without installation.

Generating Server/Proxy from IDL

Suppose the name of the example above is a file kvs.idl, we can generate codes in the following manner.

$ jenerator -l server -o . -n jubatus -t kvs.idl

See jenerator for the detailed usage of jenerator.

Implementing Server

kvs_impl.cpp constructs a server instance by using class kvs_serv. You need to define the class in kvs_serv.hpp and kvs_serv.cpp. You can use templates (kvs_serv.tmpl.{cpp,hpp}) generated by jenerator.

main function is implemented in kvs_impl.cpp, so users don’t have to implement it. Command line options are the same among all servers using Jubatus framework. The options can be referenced with --help option.

Mixable Class

TBD.

Implementing Proxy

You have nothing to implement; just compile kvs_proxy.cpp, generated by jenerator, and you will get proxy.

kvs_proxy.cpp only has main function, that registers functor for each RPC method that proxies requests and aggregates responses.