CREATE MODEL¶
Syntax:
CREATE { RECOMMENDER | CLASSIFIER | ANOMALY } MODEL
model_name [ ({ label | id }: id_col) ] AS
col_spec [ WITH convert_function ] [, ... ]
CONFIG 'json_string'
where: col_spec = wildcard | col_name
Examples:
jubaql> CREATE CLASSIFIER MODEL cls (label: label) AS
name WITH unigram
CONFIG {"method": "AROW",
"parameter": {"regularization_weight" : 1.0}}
CREATE MODEL (started)
jubaql> CREATE RECOMMENDER MODEL reco (id: 名前) AS *
CONFIG '{"method": "inverted_index",
"parameter": {}}'
CREATE MODEL (started)
jubaql> CREATE CLASSIFIER MODEL test (label: country) AS
name WITH bigram,
photo WITH jpeganalyze
CONFIG '{"method": "AROW",
"parameter": {"regularization_weight" : 1.0}}'
CREATE MODEL (started)
Explanation¶
CREATE MODEL
defines a Jubatus model to be used for training. It is assumed that the data that will be used for training is well-typed row-column-shaped data.
model_name
is a user-defined string that will identify this model later on.label | id
must belabel
for a CLASSIFIER model andid
for a RECOMMENDER model. The clause must be omitted for an ANOMALY model.id_col
is the name of the column whose value will become theid
parameter of theupdate_row(id, row)
RPC method or thelabel
of the labeled datum passed to thetrain(data)
RPC method, depending on the model type.col_spec
points to one or multiple columns that will be converted with either a Jubatus built-in function (if one exists) or a previously defined FEATURE FUNCTION namedconvert_function
. If a conversion function is not specified, it defaults tonum
for numeric values andstr
for anything else. Acol_spec
can have one of the following forms:- It can be a single column name.
In that case,
convert_function
must be a unary function and will be called with the value of that column. - It can be a column wildcard of the form
*
,*suffix
orprefix*
and then means all columns with a name that matches that wildcard description and that have not been mentioned in any previous clause. In that case,convert_function
must be a unary function and will be called for every matching column with the value of that column.
- It can be a single column name.
In that case,
json_string
is a JSON configuration string like it would normally be contained in the file passed to Jubatus at startup. However, it should not contain a"converter"
part.
After a CREATE MODEL
statement has been processed successfully, the user can use the specified model_name
in other statements.
Notes¶
- It is not specified whether the Jubatus instance will be launched right away or later. Therefore, the successful execution of this command only indicates that the syntax is correct; it does not say anything about whether startup was successful.
- Feature functions return
Map[String, Any]
where actually theAny
part should be a numeric type or a string. The map key will become a part of the key for the Jubatus datum. Say that a function with the nameproduct
is fed with values from the columnheight
and returns aMap("val" -> 80)
, then the Jubatus datum will have an entry innum_values
that looks like:"product#height#val": 80
. - When a column that is referenced as
label
/id
or in a conversion specification does not exist in the (inferred or explicitly declared) schema of a batch of the input stream and the batch is non-empty,UPDATE MODEL
orCREATE STREAM FROM ANALYZE
processing of that batch and therefore the whole process will fail after retryingspark.task.maxFailures
times. An empty batch with a mismatching schema does not cause a failure, though.