Using Code Generators¶
Development of machine learning algorithms using Jubatus framework starts with writing an IDL (RPC interface definition).
By using a code generator - jenerator
(bundled tool with Jubatus) - you can generate each component (server, proxy, client for each language) from the IDL.
By using this generator, framework users don’t need to focus on things other than implementing algorithms.
Flow of Development¶
- Define RPC interfaces that the service should have using IDL.
- Generate codes for server, proxy, common data structures and clients (C++/Python/Ruby/Java) with
jenerator
from IDL. - Implement codes of interface of user-defined class and (if necessary) mix operation.
Use the skeleton project to get started.
Why We Use IDL¶
First, there must exist same definitions in six C++ source files every time we add new learning algorithm, that is, header and implementation of client, header and implementation of proxy and header and implementation of server. This would lead to a “hotbed” of bugs every time we had changed the API.
By using IDL, users can create system through the flow above.
Currenly all algorithms (recommener, classifier, regression, stat and graph) defines its interface with jenerator
.
Composition of Files¶
Machine learning system that uses Jubatus framework consists of the following files (where NAME is a name of the service).
- NAME_serv.cpp: Implementation of the server (edit template generated by
jenerator
) - NAME_serv.hpp: Header file for
NAME_serv.cpp
(edit template generated byjenerator
) - NAME_impl.cpp:
main
function, RPC interface definition for the server and register RPC methods (automatically generated byjenerator
) - NAME_proxy.cpp: Implementation of the proxy (automatically generated by
jenerator
) - NAME_client.hpp: Implementation of the client to be used in server-to-server communication (automatically generated by
jenerator
) - NAME_types.hpp: Structures and type information (automatically generated by
jenerator
)
jenerator
: The Code Generator¶
The RPC interface is defined with MessagePack-IDL. Aside from MessagePack-IDL’s original syntax, we must add annotations for each method of RPC service in order to generate Jubatus servers and proxies.
Annotations are interpreted by our code generator, called jenerator
, but they are ignored as comments by MessagePack-IDL.
Therefore, we can generate each client with same interface by MessagePack-IDL.
Syntax of annotations for each methods is as follows.
- Each method must have 3 annotations, each of them starts with
#@
, that specifies “routing”, “lock type” and “aggregation method” in order. - The “routing” annotation defines how Jubatus Proxy proxy requests.
Three methods (
cht
,broadcast
orrandom
) are available so that we can cover distribution methods used in typical machine learning tasks.cht
means that the request is distributed by using Consistent Hashing. Methods annotated withcht
must take at least 1 argument, which is a string that is used as a key for consistent hashing. Replication level of updated data is 2 by default. You can change the replication level by specifying like#@cht(1)
.broadcast
means that the request will be broadcasted to all servers in the cluster.random
means that the request will be proxied to one randomly-chosen server in the cluster.
- The “lock type” annotation defines read/write of request, and value must be one of
analysis
,update
ornolock
.- When using
analysis
, data is locked in server with read lock and is accessible by multiple thread simultaneously. - When using
update
, data is locked with write lock, so that we can safely update the data. - When using
nolock
, no locks are aquired in the server.
- When using
- The “aggregation” annotation defines how to aggrate the results of API call from multiple servers. Available aggregators are written in aggregators.hpp.
void
type cannot be used as the return type of methods.
If the return value is not needed, you must add meaningless type such as int
or bool
.
Here is a example of MessagePack-IDL with annotation.
message entry {
0: string key
1: string value
2: int version
}
service kvs {
#@cht(2) #@update #@pass
int put(0: string key, 1: string value)
#@cht(2) #@analysis #@pass
entry get(0: string key)
#@cht(2) #@update #@pass
int del(0: string key, 1: int version)
#@broadcast #@update #@pass
int clear()
}
The following RPC methods for server are automatically appended to each service by jenerator
:
#@random #@analysis #@pass
string get_config()
#@broadcast #@analysis #@all_and
bool save(0: string id)
#@broadcast #@update #@all_and
bool load(0: string id)
#@broadcast #@analysis #@merge
map<string, map<string, string> > get_status()
Building jenerator
¶
You need OCaml >=4.02.1 (with findlib), extlib and OMake and OUnit and ppx_deriving to build jenerator
.
We recommend to use OPAM to make OCaml environment.
You have to use OPAM version 1.2 or more for installing modules which jenerator
depends.
When you want to install OPAM from its source, ocamlbrew is usefull.
$ opam switch 4.02.1
$ eval `opam config env`
$ opam install ocamlfind extlib omake ounit ppx_deriving
$ cd jubatus/tools/jenerator
$ omake
$ sudo omake install
omake install
installs jenerator
as /usr/local/bin/jenerator
(path may vary depending on your environment).
If you want to install jenerator
to other directory, use PREFIX
environment variable.
$ PREFIX=/path/to/install omake install
You can also use built jenerator
binary directly without installation.
Implementing Server¶
kvs_impl.cpp
constructs a server instance by using class kvs_serv
.
You need to define the class in kvs_serv.hpp
and kvs_serv.cpp
.
You can use templates (kvs_serv.tmpl.{cpp,hpp}
) generated by jenerator
.
main
function is implemented in kvs_impl.cpp
, so users don’t have to implement it.
Command line options are the same among all servers using Jubatus framework.
The options can be referenced with --help
option.
Mixable Class¶
TBD.
Implementing Proxy¶
You have nothing to implement; just compile kvs_proxy.cpp
, generated by jenerator
, and you will get proxy.
kvs_proxy.cpp
only has main
function, that registers functor for each RPC method that proxies requests and aggregates responses.