Python¶
Here we explain the sample program of Graph in Python.
Source_code¶
In this sample program, we will explain 1) how to configure the learning-algorithms that used by Jubatus, with the example file ‘train_route.json’; 2) how to create the graph with ‘create_graph.py’, and how to learn the training data and calculate the shortest path with the example file ‘search_route.py’. Here are the source codes.
train_route.json
1 2 3 4 5 6 7 | {
"method": "graph_wo_index",
"parameter": {
"damping_factor" : 0.9,
"landmark_num" : 256
}
}
|
create_graph.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | #!/usr/bin/env python
# -*- coding: utf-8 -*-
# Python 2 compatibility:
from __future__ import unicode_literals
try:
# for Python 3
import urllib.request as urllib
except:
# for Python 2
import urllib
from xml.dom import minidom, Node
from jubatus.graph import client, types
host = '127.0.0.1'
port = 9199
instance_name = ''
stations = {}
class station_join():
def __init__(self, station1, station2):
self.station1 = station1
self.station2 = station2
def get_station_join(line_cd):
'''
Create array of `station_join` that represents which stations are connected.
In this example, we are connecting stations BY NAME, i.e., two stations
with the same name are regarded as connected and can be transferred to
each other.
'''
join_list = []
url = 'http://www.ekidata.jp/api/n/' + str(line_cd) + '.xml'
dom = minidom.parse(urllib.urlopen(url))
for join_node in dom.getElementsByTagName('station_join'):
for join_node_child in join_node.childNodes:
if join_node_child.nodeType == Node.TEXT_NODE:
continue
name = join_node_child.nodeName
value = join_node_child.childNodes.item(0).nodeValue
if name == 'station_name1':
station_name1 = value
elif name == 'station_name2':
station_name2 = value
join_list += [ station_join(station_name1, station_name2) ]
return join_list
def create_graph(c, join_list):
for join in join_list:
# Create nodes for stations.
s1_node_id = add_station(c, join.station1)
s2_node_id = add_station(c, join.station2)
# Create bi-directional edge between two nodes.
edge_1 = types.Edge({}, s1_node_id, s2_node_id)
edge_2 = types.Edge({}, s2_node_id, s1_node_id)
c.create_edge(s1_node_id, edge_1)
c.create_edge(s2_node_id, edge_2)
# Comment-out this line if you're running in distributed mode.
c.update_index()
def add_station(c, name):
if name in stations:
node_id = stations[name]
else:
node_id = c.create_node()
c.update_node(node_id, {'name': name})
stations[name] = node_id
return node_id
def print_stations():
for station in sorted(stations.keys(), key=lambda k: int(stations[k])):
print ("{0}\t{1}".format(stations[station], station))
if __name__ == '__main__':
# Create jubagraph client.
c = client.Graph(host, port, instance_name)
# Prepare query.
pq = types.PresetQuery([], [])
c.add_shortest_path_query(pq)
# Register stations in each line.
# Do not add too much lines to prevent causing heavy load to the API server.
create_graph(c, get_station_join(11302)) # 山手線
create_graph(c, get_station_join(11312)) # 中央線
# Print station IDs; you need the ID to search route.
print ("=== Station IDs ===")
print_stations()
|
search_route.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #!/usr/bin/env python
# -*- coding: utf-8 -*-
# Python 2 compatibility:
from __future__ import unicode_literals
import sys
from jubatus.graph import client, types
host = '127.0.0.1'
port = 9199
instance_name = ''
def search_route(from_id, to_id):
c = client.Graph(host, port, instance_name)
pq = types.PresetQuery([], [])
spreq = types.ShortestPathQuery(from_id, to_id, 100, pq)
stations = c.get_shortest_path(spreq)
print ("Pseudo-Shortest Path (hops) from {0} to {1}:".format(from_id, to_id))
for station in stations:
node = c.get_node(station)
station_name = ''
if 'name' in node.property:
station_name = node.property['name']
print (" {0}\t{1}".format(station, station_name))
if __name__ == '__main__':
if len(sys.argv) < 2:
print ("Usage: {0} from_station_id to_station_id".format(sys.argv[0]))
sys.exit(1)
search_route(str(sys.argv[1]), str(sys.argv[2]))
|
Explanation¶
train_route.json
The configuration information is given by the JSON unit. Here is the meaning of each JSON field.
- method
- Specify the algorithm used in graph mining. Currently, In this example, we use the graph without indexing, so we specify it “graph_wo_index”.
- parameter
- Specify the parameters to be passed to the algorithm. We specify two parameter here, “damping_factor” and “landmark_num”. “damping_factor” is the damping factor used in PageRank calculation. It adjusts scores for nodes with differenct degrees.The bigger it is, the more sensitive to graph structure PageRank score is, but the larger biases it causes. In the original paper, 0.85 is good. “landmark_num” is used for shortest path calculation. The bigger it is, more accurate value you can get, but the more memory is required.
create_graph.py
create_graph.py generates a graph composed of Yamanote-line and Chuou-line. The client program in Graph will use the ‘Graph’ class defined in ‘jubatus.graph’. Here are the 5 methods used in the sample.
- Connect to Jubatus Server
Connect to Jubatus Server (Line 74).
Setting the IP addr, RPC port of Jubatus Server and the unique name for task identification in Zookeeper.
- Regist the preset query
The ‘add_shortest_path_query’ method must be registered beforehand. Therefore, the ‘PresetQuery’ is made (Line 77) and registed by ‘add_shortest_path_query’ (Line 78).
- Generate the graph
Make the graph composed of Yamanote-line and Chuou-line. Firstly, the method [create_graph] is called at (Line 82-83). The first argument in [create_graph] is the Graph made in Step. 1. The second argument is the return value from method [get_station_join].
Method [get_station_join] makes the combination list of two neighbor stations. The station XML file is downloaded from Web (Line 29-30). Contents of the XML file likes below. In this sample program, we ignore the factor of ‘distance’, and only consider the connections between stations. So, the values in <station_name1>, <station_name2> are not used in the program.
<ekidata version="ekidata.jp station_join api 1.0"> <station_join> <station_cd1>1131231</station_cd1> <station_cd2>1131232</station_cd2> <station_name1>Nichi-Hachioji</station_name1> <station_name2>Takao</station_name2> <lat1>35.656621</lat1> <lon1>139.31264</lon1> <lat2>35.642026</lat2> <lon2>139.282288</lon2> </station_join> <station_join> <station_cd1>1131230</station_cd1> <station_cd2>1131231</station_cd2> <station_name1>Hachioji</station_name1> <station_name2>Nichi-Hachioji</station_name2> <lat1>35.655555</lat1> <lon1>139.338998</lon1> <lat2>35.656621</lat2> <lon2>139.31264</lon2> </station_join> <station_join> <station_cd1>1131229</station_cd1> <station_cd2>1131230</station_cd2> <station_name1>Toyota</station_name1> <station_name2>Hachioji</station_name2> <lat1>35.659502</lat1> <lon1>139.381495</lon1> <lat2>35.655555</lat2> <lon2>139.338998</lon2> </station_join> -Snip-
Now, we input the value of <station_cd1> in the XML file into the instance variable ‘station1’ in [StationJoin] class, and the value of <station_cd2> in to ‘station2’. The number of instance created in [StationJoin] is the same as the number of <station_join> tags, and they are sotred in the ArrayList that created at Line 28 (Line 31-41).
Next, we make the graph by using the list created above (Line 44-57). The method [create_graph] performs the following task.
- Add station information and ID.
- Insert node into graph. Here, a node means a station. (eg. Shinagawa, Ochanomizu, Tokyo, etc.)
- Create links between the added two neighbor stations
- Make the bi-link between the registed station to its neighbor stations. Here, a link means a route. (eg. Harajuku <-> Shibuya, etc.)
- 3-1. Add station information and ID.
Method [add_station] is called (Line 47-48), to add every pair of neighboring nodes <station1, station2> in to the graph. Method [add_station] will check the map of ‘stations’. If the map contains the specified station, the station_id will be returned; Otherwise, a new node is created, and its ID is returned after storing the nodeID and station name into the ‘stations’ map (Line 59-66). Mehods [create_node] and [update_node] in Graph regist the new node. first, [create_node] method is called and the returned value is the nodeId. After that, a node is added into the graph. Then, we regist the key-value <name, “station name”> into the ‘property’ (Line 63). Finally, [update_node] method updates the ‘property’ with the node created at Line 63 (Line 64).
- 3-2. Create links between the added two neighbor stations
After adding the two neighbor stations by method [addStation], we create the bi-links between station1 and station2 (Line 51-54). Method [create_edge] is used to create the bi-links. The second argument means the start node’s ID. The third argument is an edge instance, which has the nodeID of both start and end nodes of the edge.
The [update_index] method in Line 57 is used for locally Mix operation, do not use it in distributed environment.
- Show the stations
In step 3-1, station name and station ID(nodeID) are stored into the “stations”. Here, we output the stations names by the ascending order of their IDs (Line 68-70).
search_route.py
‘search_route.py’ finds the shortest path between every 2 stations from the graph that made by create_graph.py. The method it used is the “get_shortest_path”.
- Connect to Jubatus Server
Connect to Jubatus Server (Line 12).
Setting the IP addr, RPC port of Jubatus Server and the unique name for task identification in Zookeeper.
- Prepare the query
Prepare the query for the shortest path calculation (Line 14-15). Create the ShortestPathQuery required by the [get_shortest_path] method (Line 15). Store the start node’s & end node’s nodeIDs into the first & second arguments in the ‘types.ShortestPathQuery’. The third argument is the number of ‘maxhop’, the search process will be truncated if it fails to find the route within the specified number of ‘maxhop’. Also note, the query should be registed by “add_shortest_path_query” beforehand.
- Calculate the shortes path
By specifying the “ShortestPathQuery” that created in Step.2, [get_shortest_path] method will find the shortest path (Line 16).
- Show the results
Show the ID of stations that on the shortes path calculated in Step 3 (Line 18-24).
Run the sample program¶
- At Jubatus Server
start “jubagraph” process.
$ jubagraph --configpath train_route.json
- At Jubatus Client
- Create graph
Make the railway route graph.
$ ./create_graph.py === Station IDs === 0 Shinagawa 1 Osaki 4 Tamachi ... 139 Nagano 144 Yotsuya 147 Ochanomizu
Output of the station name, and their station ID (node ID on graph).
- Search the shortest path
Search the shortest path between 2 stations.
$ ./search_route.py 0 144 Pseudo-Shortest Path (hops) from 0 to 144: 0 Shinagawa 4 Tamachi 7 Hamamatsucho 10 Shinbashi 13 Yurakucho 16 Tokyo 19 Kanda 147 Ochanomizu 144 Yotsuya