Nearest Neighbor¶
- See IDL definition for detailed specification.
- See Algorithms for detailed description of algorithms used in this server.
Configuration¶
Configuration is given as a JSON file. We show each field below:
-
method
Specify algorithm for nearest neighbor. You can use these algorithms.
Value Method "lsh"
Use Locality Sensitive Hashing based on cosine similarity. "minhash"
Use MinHash. [Ping2010] "euclid_lsh"
Use LSH based on cosine similarity for nearest neighbor search with Euclidean distance.
-
parameter
Specify parameters for the algorithm. Its format differs for each
method
.- common
threads: The number of threads execute random_projection and search. The bigger it is, query latency becomes smaller because data is divided into several parts and processed by multiple threads in parallel. This option has been added since 0.9.0. Single thread is used in before versions. (Integer)
The behavior of this option varies as below:
threads
< 0- threads is set to the number of logical CPU cores
threads
= 0- The same behavior as
threads
is set to 1.
- The same behavior as
- 1 <=
threads
<= The number of logical cores of CPU- The number of threads is set to
threads
.
- The number of threads is set to
- The number of logical cores of CPU <
threads
.- The number of threads is set to the number of logical CPU cores. In addtion, data points are divided into threads parts.
note:
threads
can be omitted (It works with 1 thread) .- lsh
hash_num: Bit length of hash values. The bigger it is, the more accurate results you can get, but the more memory is required. (Integer)
- Range: 1 <=
hash_num
- Range: 1 <=
- minhash
hash_num: Bit length of hash values. The bigger it is, the more accurate results you can get, but the more memory is required. (Integer)
- Range: 1 <=
hash_num
- Range: 1 <=
- euclid_lsh
hash_num: Bit length of hash values. The bigger it is, the more accurate results you can get, but the more memory is required. (Integer)
- Range: 1 <=
hash_num
- Range: 1 <=
-
converter
Specify configuration for data conversion. Its format is described in Data Conversion.
- Example:
{ "method": "lsh", "parameter" : { "hash_num" : 64 }, "converter" : { "string_filter_types": {}, "string_filter_rules":[], "num_filter_types": {}, "num_filter_rules": [], "string_types": {}, "string_rules":[ {"key" : "*", "type" : "str", "sample_weight":"bin", "global_weight" : "bin"} ], "num_types": {}, "num_rules": [ {"key" : "*", "type" : "num"} ] } }
Data Structures¶
Methods¶
-
service
nearest_neighbor
¶ -
bool
set_row
(0: string id, 1: datum d)¶ Parameters: - id – row ID
- row –
datum
for the row
Returns: True if this function updates models successfully
Updates the row whose id is
id
with givenrow
. If the row with the sameid
already exists, the row is overwritten withrow
(note that this behavior is different from that of recommender). Otherwise, new row entry will be created. If the server that manages the row and the server that received this RPC request are same, this operation is reflected instantly. If not, update operation is reflected after mix.
-
list<id_with_score>
neighbor_row_from_id
(0: string id, 1: uint size)¶ Parameters: - id – row ID in the nearest neighbor search table
- size – number of rows to be returned
Returns: row IDs that are the nearest to the row
id
and their distance valuesReturns
size
rows (at maximum) that have most similardatum
toid
and their distance values.
-
list<id_with_score>
neighbor_row_from_datum
(0: datum query, 1: uint size)¶ Parameters: - query –
datum
for nearest neighbor search - size – number of rows to be returned
Returns: row IDs that are the nearest to
query
and their distance valuesReturns
size
rows (at maximum) of whichdatum
are most similar toquery
and their distance values.- query –
-
list<id_with_score>
similar_row_from_id
(0: string id, 1: uint ret_num)¶ Parameters: - id – row ID in the nearest neighbor search table
- ret_num – number of rows to be returned
Returns: row IDs that are the nearest to the row
id
and their similarity valuesReturns
ret_num
rows (at maximum) that have most similardatum
toid
and their similarity values.
-
list<id_with_score>
similar_row_from_datum
(0: datum query, 1: uint ret_num)¶ Parameters: - query –
datum
for nearest neighbor search - ret_num – number of rows to be returned
Returns: row IDs that are the nearest to
query
and their similarity valuesReturns
ret_num
rows (at maximum) of whichdatum
are most similar toquery
and their similarity values.- query –
-
list<string>
get_all_rows
()¶ Returns: list of all row IDs Returns the list of all row IDs.
-
bool