Nearest Neighbor¶
- See IDL definition for detailed specification.
 - See Algorithms for detailed description of algorithms used in this server.
 
Configuration¶
Configuration is given as a JSON file. We show each field below:
- 
method Specify algorithm for nearest neighbor. You can use these algorithms.
Value Method "lsh"Use Locality Sensitive Hashing based on cosine similarity. "minhash"Use MinHash. [Ping2010] "euclid_lsh"Use LSH based on cosine similarity for nearest neighbor search with Euclidean distance. 
- 
parameter Specify parameters for the algorithm. Its format differs for each
method.- common
 threads: The number of threads execute random_projection and search. The bigger it is, query latency becomes smaller because data is divided into several parts and processed by multiple threads in parallel. This option has been added since 0.9.0. Single thread is used in before versions. (Integer)
The behavior of this option varies as below:
threads< 0- threads is set to the number of logical CPU cores
 
threads= 0- The same behavior as 
threadsis set to 1. 
- The same behavior as 
 - 1 <= 
threads<= The number of logical cores of CPU- The number of threads is set to 
threads. 
 - The number of threads is set to 
 - The number of logical cores of CPU < 
threads.- The number of threads is set to the number of logical CPU cores. In addtion, data points are divided into threads parts.
 
 
note:
threadscan be omitted (It works with 1 thread) .- lsh
 hash_num: Bit length of hash values. The bigger it is, the more accurate results you can get, but the more memory is required. (Integer)
- Range: 1 <= 
hash_num 
- Range: 1 <= 
 - minhash
 hash_num: Bit length of hash values. The bigger it is, the more accurate results you can get, but the more memory is required. (Integer)
- Range: 1 <= 
hash_num 
- Range: 1 <= 
 - euclid_lsh
 hash_num: Bit length of hash values. The bigger it is, the more accurate results you can get, but the more memory is required. (Integer)
- Range: 1 <= 
hash_num 
- Range: 1 <= 
 
- 
converter Specify configuration for data conversion. Its format is described in Data Conversion.
- Example:
 { "method": "lsh", "parameter" : { "hash_num" : 64 }, "converter" : { "string_filter_types": {}, "string_filter_rules":[], "num_filter_types": {}, "num_filter_rules": [], "string_types": {}, "string_rules":[ {"key" : "*", "type" : "str", "sample_weight":"bin", "global_weight" : "bin"} ], "num_types": {}, "num_rules": [ {"key" : "*", "type" : "num"} ] } }
Data Structures¶
Methods¶
- 
service 
nearest_neighbor¶ - 
bool 
set_row(0: string id, 1: datum d)¶ Parameters: - id – row ID
 - row – 
datumfor the row 
Returns: True if this function updates models successfully
Updates the row whose id is
idwith givenrow. If the row with the sameidalready exists, the row is overwritten withrow(note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.
- 
list<id_with_score> 
neighbor_row_from_id(0: string id, 1: uint size)¶ Parameters: - id – row ID in the nearest neighbor search table
 - size – number of rows to be returned
 
Returns: row IDs that are the nearest to the row
idand their distance valuesReturns
sizerows (at maximum) that have most similardatumtoidand their distance values.
- 
list<id_with_score> 
neighbor_row_from_datum(0: datum query, 1: uint size)¶ Parameters: - query – 
datumfor nearest neighbor search - size – number of rows to be returned
 
Returns: row IDs that are the nearest to
queryand their distance valuesReturns
sizerows (at maximum) of whichdatumare most similar toqueryand their distance values.- query – 
 
- 
list<id_with_score> 
similar_row_from_id(0: string id, 1: uint ret_num)¶ Parameters: - id – row ID in the nearest neighbor search table
 - ret_num – number of rows to be returned
 
Returns: row IDs that are the nearest to the row
idand their similarity valuesReturns
ret_numrows (at maximum) that have most similardatumtoidand their similarity values.
- 
list<id_with_score> 
similar_row_from_datum(0: datum query, 1: uint ret_num)¶ Parameters: - query – 
datumfor nearest neighbor search - ret_num – number of rows to be returned
 
Returns: row IDs that are the nearest to
queryand their similarity valuesReturns
ret_numrows (at maximum) of whichdatumare most similar toqueryand their similarity values.- query – 
 
- 
list<string> 
get_all_rows()¶ Returns: list of all row IDs Returns the list of all row IDs.
- 
bool