Frequently Asked Questions (FAQs)¶
mecab_splitter_create.trivialdoes not pass the unittest?
Check your mecab dictionary and ensure that your mecab command accept UTF-8 charsets.
- How do I install
jubadumpcan be installed via binary packages.
- DEB packages (Ubuntu):
jubadumpis included in
- RPM packages (RHEL): Run
sudo yum install jubadump.
On other environments, you need to build it from source.
- How do I update Jubatus RPM package (RHEL)?
Run the following command:$ sudo yum update jubatus
onigurumawill not be updated automatically even if
jubatuspackage is updated. Run
yum updatefor them if needed.
- How do I update Jubatus DEB package (Ubuntu)?
Run the following command:$ sudo apt-get update $ sudo apt-get install jubatus
- How do I update Jubatus client?
See Jubatus Wiki: Installing and Updating Clients for instructions.
- How to install through Proxy?
Binary Package (Apt) in Ubuntu Environment
Error turns out when Proxy in apt is not added. Insert the line below into
/etc/apt/apt.conf.$ sudo vi /etc/apt/apt.confAcquire::http::Proxy "http://username:firstname.lastname@example.org:port/";
Python Client (pip)
Errors like below may come out when Proxy is required. In this case, please specify the Proxy option when execute your command.Cannot fetch index base URL http://pypi.python.org/simple/ Could not find any downloads that satisfy the requirement jubatus No distributions at all found for jubatus Storing complete log in /home/jubatus/.pip/pip.log$ sudo pip --proxy=http://username:email@example.com:port/ install jubatus
The installation completes when logs like below come out.Successfully installed jubatus msgpack-rpc-python msgpack-python tornado Cleaning up...
Ruby Client (RubyGems)
Please set your environment variables like below before your installation.export http_proxy=http://username:firstname.lastname@example.org:port/
- How to develop by Java with client library
It is much convenient to use the skeleton project, which published at GitHub <https://github.com/jubatus/jubatus-java-skelton> (template for Eclipse project), when developing Jubatus client with Java. Please follow the instructions below to use the Java skeleton for your development.
- Start Eclipse, select [File]>[Import].
- Select [Git] > [Projects from Git], click the [Next] button.
- Select [URI], click the [Next] button.
- Input “https://github.com/jubatus/jubatus-java-skelton.git” into [URI], click the [Next] button.
- Forward through the dialogs operations, and click the [Finish] button.
Once the import is finished, Maven will download the Jubatus client library automatically. Under
src/main/javaDirectory(default package), there will be a simple program Client.java which using Jubatus recommender function.
- When using python client, “got socket.error: [Errno 99] Cannot assign requested address” (or kind of
sudo /sbin/sysctl -w net.ipv4.tcp_tw_recycle=1
- I’ve got an exception with message “1” from Jubatus client library, why?
The version of the client library you installed is not compatible with the version of the Jubatus server you are connecting to. See the Jubatus Wiki: Client Compatibility and Documentation for the compatibility information.
Technically, the error “1” means “no such method on RPC server”.
- I’ve got an exception with message “2” from Jubatus client library, why?
This is a type mismatch error between clients and servers.
A common mistake is using integer instead of float in values of
num_values. Always cast values in
num_valuesas float. If you are using literals like
10, replace it with
10.0instead. Another common mistake is assigning
NULLfor objects like vector.
This error may also occur if the version of the client library you installed is not compatible with the version of the Jubatus server you are connecting to. Check out the Jubatus Wiki: Client Compatibility and Documentation for the compatibility information.
- Client library occasionally throws RPC timeout errors; it seems that servers automatically disconnect clients. Why?
Jubatus servers automatically close connections when the idle timeout (given by the command line parameter
jubatus_server -t) expires. You need to retry the RPC call to re-establish the connection. Please refer RPC Error Handling to handle RPC errors including timeout error caused by server’s auto session-closing.
To disable this auto-disconnect feature, set
jubatus_server -tto 0, which means “no timeout”. In this case, clients must explicitly close the TCP connection using
get_client(). Or, please set timeout enough longer than a client’s connection lifetime.
- The confirm/check methods for MIX operations, when Jubatus works in distributed model
Information about the Mix operations is recorded in the log files at Jubatus servers, which seems like below.I0218 06:01:49.587540 3845 linear_mixer.cpp:173] starting mix: I0218 06:01:49.703693 3845 linear_mixer.cpp:231] mixed with 3 servers in 0.112371 secs, 8 bytes (serialized data) has been put. I0218 06:01:49.705159 3845 linear_mixer.cpp:185] .... 22th mix done. I0218 06:03:15.502995 3845 linear_mixer.cpp:173] starting mix: I0218 06:03:15.642297 3845 linear_mixer.cpp:231] mixed with 3 servers in 0.137258 secs, 8 bytes (serialized data) has been put. I0218 06:03:15.644685 3845 linear_mixer.cpp:185] .... 23th mix done.
- Is it appropriate to use only a single server for all these processes, including jubaclassifier, jubaclassifier_proxy/Client and ZooKeeper, even in distributed model.
No Problem. However, comparing with the environment where each process has its privately owned server, the overall performance may decrease. In addition, we recommend an odd number of the ZooKeeper servers for the better ensemble.
- What’s the difference between Jubatus Keeper and Proxy?
Keeper is renamed to Proxy in version 0.5.0 The role of proxies is same as the role of keepers in 0.4.x or before.
- In Classifier/Regression learning process, will the model learnt turns to be different due to the two different training methods below,
- Input the training data into Jubatus in a patch way. (Bulk learning, the train method is called only by one time)
- The train method is called every time when learning each piece of training data.
No difference in the final result of trained model.
- jubaanomaly only outputs 1.0 or infinity. Why?
It might relate to the scaling problem of the input data, in which nearest neighbor search cannot work properly.
jubaanomaly (as LOF algorithm) depends on euclid LSH which has many parameters related to the scale. If the scale is too large compared to the setting, LSH-based nearest neighbor fails and LOF model does not provide reasonable scores.
You may avoid such situation by using the following techniques.
- 1: Normalize each feature value
Nearest neighbor search is affected by the difference in scales of the features. It is better to normalize all of the feature values (limited from 0.0 to 1.0) or starndardize them (to have about 1.0 standard deviation).
- 2: Change parameters for underlying euclid LSH
Especially, we recommend you to change the most important parameter
bin_widthfor some values.
- jubaanomaly outputs extremely large score. Why?
When many points concentrated on a certain point, lof score can be extremely large. Floating point rounding errors cause this problem. In this case, Before Jubatus 1.0.2, jubaanomaly outputs
infbecause of zero division. Jubatus 1.0.3 or later, in order to specify the problem easily, jubaanomaly avoids zero division and outputs large numbers (about 10e9-10e10) instead of
- Why jubaanomaly gets slow after adding many samples?
jubaanomaly (as LOF algorithm) depends on iterations of nearest neighbor search and its default configuration uses euclid LSH for speed-up. However, updating the internal state of the LOF model still takes quadratic time at worst with respect to the number of ever-added samples. For more details, please refer to the original paper [Breunig2000] .
- How to avoid such speed down?
You can control the trade-off between speed and accuracy by using the following techniques.
- 1: use
unlearnerFrom version 0.6.0, users can use
unlearnerto set upper limit on stored data. If more data than limit is regestered, Jubatus will delete old data. It will improve the speed of Jubatus at the cost of accuracy. You can select unlearning strategy from
lruwill delete old data in the order of registration.
randomwill delete old data with random.
- 2: Modify baseline euclid LSH with lower accuracy and faster computationBy reducing the parameters values of (euclid) LSH such as
bin_width, you can make neighbor nearest computation faster with lower accuracy, in which some more nearest samples might be ignored. This may affect the final anomaly score in comparison with the ground truth in which everything is computed in batch-processing manner.
- 3: Use
calc_scorefor just obtaining anomaly score
addfunction really appends the sample to the nearest neighbor storage, update the LOF model, and calculate its LOF value. On the other hand,
calc_scorefunction just computes an LOF value for the input sample based on the current LOF model, which works much faster. If you can assume that the data distribution is almost stable, we recommend you to use only
addat the early stage to make a valid LOF model as early as possible, say, until 1000 samples are stored in the storage. Then you can swith two functions, with more freuquent
calc_score. For example, it would work fine and much faster with the ratio
- 4: Decrease
reverse_nearest_neighbor_numIt also reduces the computation time for LOF. However, the number should not be smaller than
- How does ‘jubatus’ read?
Please do not run ‘say’ command in Mac OS.