Graph-based Natural Language Processing and Information Retrieval

ReneeBK

1794

收藏 2015-06-13

Graph-based Natural Language Processing and Information Retrieval

By: Rada Mihalcea; Dragomir Radev

Publisher: Cambridge University Press

Pub. Date: April 11, 2011

Print ISBN: 978-0-521-89613-9

Pages in Print Edition: 202

Subscriber Rating: [0 Ratings]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

ReneeBK

2015-6-13 12:13:03

Depth-First Graph Traversal

Figure 2.2. Depth-first traversal of a graph

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2015-6-13 12:14:37

Breadth-First Graph Traversal

Figure 2.4. Breadth-first traversal of a graph.

Similar to the depth-first traversal algorithm, the running time of the breadth-first traversal is O(|E| + |V|).

The depth-first and breadth-first traversal strategies also can be used to search for a node in the graph. These strategies are referred to as depth-first search andbreadth-first search algorithms, respectively. Basically, the traversal algorithm is used until the target node is reached, at which point the algorithm stops.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2015-6-13 12:16:11

Minimum Spanning Trees

Minimum spanning trees have the following properties: (1) As shown in Figure 2.5, it is not unusual for a graph to have several minimum spanning trees. However, even if a graph admits several minimum spanning trees, all of the trees will have the same total weight. (2) For every edge not included in the minimum spanning tree, adding the edge to the tree results in the construction of a cycle. Moreover, the newly added edge will be a maximum-weight edge in that cycle. (3) The number of edges in a minimum spanning tree is equal to the number of nodes in the graph minus 1.

Several algorithms can be applied to identify the minimum spanning trees of a graph, including Prim’s algorithm. Briefly, the algorithm starts with an arbitrary node (referred to as the “root”). Next, at each iteration, it branches out from the tree constructed so far by choosing an edge that has the minimum weight among all the edges that could be attached. The algorithm adds to the tree the vertex that is associated with that edge. Thus, vertices in the graph can be divided into three disjoint categories: (1) tree vertices, which are the vertices belonging to the tree constructed so far; (2) fringe vertices, which are the vertices that are not part of the tree but are adjacent to some vertex in the tree; and (3) unseen vertices, which are all of the other vertices in the graph. Prim’s algorithm is described in the pseudocode in Figure 2.6.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2015-6-13 12:18:47

Shortest-Path Algorithms
Given a graph G and a source node N in the graph, the shortest-path problem is the problem of finding the shortest path from N to any other vertex in the graph. In the case of unweighted graphs, the length of a path is calculated as the number of edges on the path. In the case of weighted graphs, the length is calculated as the sum of the weights of all edges on the path. The algorithm used for the case of weighted graphs also is referred to as Dijkstra’s algorithm.

Figure 2.7. Shortest paths starting with the source node C.

The pseudocode for finding the shortest paths from a source node to all other nodes in an unweighted graph is shown in Figure 2.8.

Figure 2.8. Pseudo-code algorithm for finding the shortest path in unweighted graphs.

When the shortest path must be calculated between each pair of vertices in the graph, other algorithms such as the Floyd–Warshall algorithm can be used, which minimizes the number of path comparisons by using dynamic programming.

The shortest-path algorithm also can be applied to weighted graphs, with small modifications. Primarily, the path is calculated as the sum of weights of the edges; the length of the shortest path to a node can be updated if a new shorter path is found.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ReneeBK

2015-6-13 12:22:28

Random Walks

Imagine a person, Paul, standing on the corner of 43rd Street and Madison Avenue in Midtown Manhattan during a blackout (Figures 2.19 and 2.20). Paul is a few blocks away from his home on the corner of 47th Street and Madison Avenue. However, in the dark, he cannot see anything and wanders aimlessly from one street corner to another, going up and down Madison Avenue. At each step, he has a 50 percent chance of going up the avenue and a 50 percent chance of going down. His goal is to get home safely rather than fall in the open manhole located at the corner of 42nd street and Madison Avenue. We assume that the random walk will stop when one of two things happens: Paul either ends up in the manhole or he arrives home safely. It can be proven mathematically that for a given probability p, he eventually will end in one of those two places with a probability no larger than p.

Figure 2.19. Map of Midtown Manhattan.

Figure 2.20. Random walk with 6 steps (ϕ to N = 5). For simplicity, node labels starting at zero are used.

If he is unlucky, Paul’s very first step will take him to the manhole. So, the probability of this event (let us call it M) is at least 0.5. If he is lucky, Paul will move away from the manhole in the first step; however, if after that he makes two more steps toward the manhole, he will fall into it. The probability of this happening is 0.5 ∗ 0.5 ∗ 0.5 = 0.53 = 0.125. Obviously, there are many other sequences of steps that lead to the event M. As can be seen, the probability of the event H = 1 – M (i.e., arriving home safely) seems rather small. How can we compute it?

Clearly, the value of pH is a function of the starting point. In one extreme case, Paul would start at 47th Street. In that case, he is already safely at home, so the probability pH(47) = 1. In the other extreme case, he starts at 42nd Street and falls immediately into the hole, resulting in pH(42) = 0. How can we find the value ofPH(i) for an arbitrary street number i? This is an example of a random-walk model in which a walker takes random steps on the graph G, with the walk being modeled as a Markov process – that is, the decision on what edge to follow is solely based on the vertex where the walker is currently located. Under certain conditions, this model converges to a stationary distribution of probabilities, associated with vertices in the graph. Chapter 5 provides more details on random-walk algorithms, and specifically on the PageRank algorithm used in information retrieval.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

扫码加我 拉你入群

Depth-First Graph Traversal

扫码加我 拉你入群

Breadth-First Graph Traversal

扫码加我 拉你入群

Minimum Spanning Trees

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群