CREATE MODEL¶
Syntax:
CREATE { RECOMMENDER | CLASSIFIER | ANOMALY } MODEL
model_name [ ({ label | id }: id_col) ] AS
col_spec [ WITH convert_function ] [, ... ]
CONFIG 'json_string'
where: col_spec = wildcard | col_name
Examples:
jubaql> CREATE CLASSIFIER MODEL cls (label: label) AS
name WITH unigram
CONFIG {"method": "AROW",
"parameter": {"regularization_weight" : 1.0}}
CREATE MODEL (started)
jubaql> CREATE RECOMMENDER MODEL reco (id: 名前) AS *
CONFIG '{"method": "inverted_index",
"parameter": {}}'
CREATE MODEL (started)
jubaql> CREATE CLASSIFIER MODEL test (label: country) AS
name WITH bigram,
photo WITH jpeganalyze
CONFIG '{"method": "AROW",
"parameter": {"regularization_weight" : 1.0}}'
CREATE MODEL (started)
Explanation¶
CREATE MODEL defines a Jubatus model to be used for training. It is assumed that the data that will be used for training is well-typed row-column-shaped data.
model_nameis a user-defined string that will identify this model later on.label | idmust belabelfor a CLASSIFIER model andidfor a RECOMMENDER model. The clause must be omitted for an ANOMALY model.id_colis the name of the column whose value will become theidparameter of theupdate_row(id, row)RPC method or thelabelof the labeled datum passed to thetrain(data)RPC method, depending on the model type.col_specpoints to one or multiple columns that will be converted with either a Jubatus built-in function (if one exists) or a previously defined FEATURE FUNCTION namedconvert_function. If a conversion function is not specified, it defaults tonumfor numeric values andstrfor anything else. Acol_speccan have one of the following forms:- It can be a single column name.
In that case,
convert_functionmust be a unary function and will be called with the value of that column. - It can be a column wildcard of the form
*,*suffixorprefix*and then means all columns with a name that matches that wildcard description and that have not been mentioned in any previous clause. In that case,convert_functionmust be a unary function and will be called for every matching column with the value of that column.
- It can be a single column name.
In that case,
json_stringis a JSON configuration string like it would normally be contained in the file passed to Jubatus at startup. However, it should not contain a"converter"part.
After a CREATE MODEL statement has been processed successfully, the user can use the specified model_name in other statements.
Notes¶
- It is not specified whether the Jubatus instance will be launched right away or later. Therefore, the successful execution of this command only indicates that the syntax is correct; it does not say anything about whether startup was successful.
- Feature functions return
Map[String, Any]where actually theAnypart should be a numeric type or a string. The map key will become a part of the key for the Jubatus datum. Say that a function with the nameproductis fed with values from the columnheightand returns aMap("val" -> 80), then the Jubatus datum will have an entry innum_valuesthat looks like:"product#height#val": 80. - When a column that is referenced as
label/idor in a conversion specification does not exist in the (inferred or explicitly declared) schema of a batch of the input stream and the batch is non-empty,UPDATE MODELorCREATE STREAM FROM ANALYZEprocessing of that batch and therefore the whole process will fail after retryingspark.task.maxFailurestimes. An empty batch with a mismatching schema does not cause a failure, though.