# CREATE MODEL¶

Syntax:

CREATE { RECOMMENDER | CLASSIFIER | ANOMALY } MODEL
model_name [ ({ label | id }: id_col) ] AS
col_spec [ WITH convert_function ] [, ... ]
CONFIG 'json_string'

where: col_spec = wildcard | col_name


Examples:

jubaql> CREATE CLASSIFIER MODEL cls (label: label) AS
name WITH unigram
CONFIG {"method": "AROW",
"parameter": {"regularization_weight" : 1.0}}
CREATE MODEL (started)

jubaql> CREATE RECOMMENDER MODEL reco (id: 名前) AS *
CONFIG '{"method": "inverted_index",
"parameter": {}}'
CREATE MODEL (started)

jubaql> CREATE CLASSIFIER MODEL test (label: country) AS
name WITH bigram,
photo WITH jpeganalyze
CONFIG '{"method": "AROW",
"parameter": {"regularization_weight" : 1.0}}'
CREATE MODEL (started)


## Explanation¶

CREATE MODEL defines a Jubatus model to be used for training. It is assumed that the data that will be used for training is well-typed row-column-shaped data.

• model_name is a user-defined string that will identify this model later on.
• label | id must be label for a CLASSIFIER model and id for a RECOMMENDER model. The clause must be omitted for an ANOMALY model.
• id_col is the name of the column whose value will become the id parameter of the update_row(id, row) RPC method or the label of the labeled datum passed to the train(data) RPC method, depending on the model type.
• col_spec points to one or multiple columns that will be converted with either a Jubatus built-in function (if one exists) or a previously defined FEATURE FUNCTION named convert_function. If a conversion function is not specified, it defaults to num for numeric values and str for anything else. A col_spec can have one of the following forms:
• It can be a single column name. In that case, convert_function must be a unary function and will be called with the value of that column.
• It can be a column wildcard of the form *, *suffix or prefix* and then means all columns with a name that matches that wildcard description and that have not been mentioned in any previous clause. In that case, convert_function must be a unary function and will be called for every matching column with the value of that column.
• json_string is a JSON configuration string like it would normally be contained in the file passed to Jubatus at startup. However, it should not contain a "converter" part.

After a CREATE MODEL statement has been processed successfully, the user can use the specified model_name in other statements.

## Notes¶

• It is not specified whether the Jubatus instance will be launched right away or later. Therefore, the successful execution of this command only indicates that the syntax is correct; it does not say anything about whether startup was successful.
• Feature functions return Map[String, Any] where actually the Any part should be a numeric type or a string. The map key will become a part of the key for the Jubatus datum. Say that a function with the name product is fed with values from the column height and returns a Map("val" -> 80), then the Jubatus datum will have an entry in num_values that looks like: "product#height#val": 80.
• When a column that is referenced as label/id or in a conversion specification does not exist in the (inferred or explicitly declared) schema of a batch of the input stream and the batch is non-empty, UPDATE MODEL or CREATE STREAM FROM ANALYZE processing of that batch and therefore the whole process will fail after retrying spark.task.maxFailures times. An empty batch with a mismatching schema does not cause a failure, though.