【学习笔记】Python Algorithms - Dynamic Programming

832

收藏 2019-05-10

Dynamic Programming: Tangled Dependencies and Memoization
The core technique of DP, when applied to algorithm design, is caching. You decompose your problem recursively/inductively just like before - but you allow overlap between the subproblems. This means that a plain recursive solution could easily reach each base case an exponential number of times; however, by caching these results, this exponential waste can be trimmed away, and the result is usually both an impressively efficient algorithm and a greater insight into the problem.

Commonly, DP algorithms turn the recursive formulation upside down, making it iterative and filling out some data structure (such as a multidimensional array) step by step. Another option - one I think is particularly suited to high-level languages such as Python - is to implement the recursive formulation directly but to cache the return values. If a call is made more than once with the same arguments, the result is simply returned directly from the cache. This is known as memoization.

8.1 Don’t Repeat Yourself
Avoid re-computing overlapped sub-problems.
an idea when working with dynamic programming : decompose the problem by conditioning on whether some element is included. That is, we get one recursive call if an element is included and another if it isn’t.

8.2 canonical DP problem ===
#1 Shortest Paths in Directed Acyclic Graphs
At the core of dynamic programming lies the idea of sequential decision problems. Each choice you make leads to a new situation, and you need to find the best sequence of choices that gets you to the situation you want. This is similar to how greedy algorithms work - it’s just that they rely on which choice looks best right now, while in general, you have to be less myopic and take future effects into consideration.

The prototypical sequential decision problem is finding your way from one node to another in a directed, acyclic graph. We represent the possible states of our decision process as individual nodes. The out-edges represent the possible choices we can make in each state. The edges have weights, and finding an optimal set of choices is equivalent to finding a shortest path.

NOTE: VARIETIES OF DAG SHORTEST PATH
Although the basic algorithm is the same, there are many ways of finding the shortest path in a DAG and, by extension, solving most DP problems. You could do it recursively, with memoization, or you could do it iteratively, with relaxation. For the recursion, you could start at the first node, try various “next steps,” and then recurse on the remainder, or if your graph representation permits, you could look at the last node and try “previous steps” and recurse on the initial part. The former is usually much more natural, while the latter corresponds more closely to what happens in the iterative version.

For iterative version, there are two choices:

You can relax the edges out of each node (in topologically sorted order)
or you can relax all edges into each node. The latter more obviously yields a correct result but requires access to nodes by following edges backward. This isn’t as far-fetched as it seems when you’re working with an implicit DAG in some non-graph problem. (For example, in the longest increasing subsequence problem, discussed later in this chapter, looking at all backward “edges” can be a useful perspective.)

Outward relaxation, called reaching, is exactly equivalent when you relax all edges. As explained, once you get to a node, all its in-edges will have been relaxed anyway. However, with reaching, you can do something that’s hard in the recursive version (or relaxing in-edges): pruning. If, for example, you’re interested only in finding all nodes that are within a distance r, you can skip any node that has distance estimate greater than r. You will still need to visit every node, but you can potentially ignore lots of edges during the relaxation. This won’t affect the asymptotic running time, though (Exercise 8-6).

Note that finding the shortest paths in a DAG is surprisingly similar to, for example, finding the longest path, or even counting the number of paths between two nodes in a DAG. The latter problem is just what we did with Pascal’s triangle earlier; the same approach would work for an arbitrary DAG.

#2 Longest Increasing Subsequence
a lot - perhaps the majority - of the DP problems won’t have anything to do with (explicit) graphs. In these cases, you’ll have to sniff out the DAG or sequential decision process yourself. Or perhaps it’ll be easier to think of it in terms of recursive decomposition and ignore the whole DAG structure.

Let’s go straight for the induction, and we can think more in graph terms later. To do the induction (or recursive decomposition), we need to define our subproblems - one of the main challenges of many DP problems. In many sequence-related problems, it can be useful to think in terms of prefixes - that we’ve figured out all we need to know about a prefix and that the inductive step is to figure things out for another element. In this case, that might mean hat we’d found the longest increasing subsequence for each prefix, but that’s not informative enough. We need to strengthen our induction hypothesis so we can actually implement the inductive step. Let’s try, instead, to find the longest increasing subsequence that ends at each given position.

If we’ve already know how to find this for the first k positions, how can we find it for position k + 1? Once we’ve gotten this far, the answer is pretty straightforward: We just look at the previous positions and look at those
whose elements are smaller than the current one. Among those, we choose the one that is at the end of the longest subsequence. Direct recursive implementation will give us exponential running time, but once again, memoization gets rid of the exponential redundancy, as shown in Listing 8-5. Once again, I’ve focused on finding the length of the solution; extending the code to find the actual subsequence isn’t all that hard.

NOTE: there is more than one way to view most DP problems. Sometimes you want to focus on the recursive decomposition and induction; sometimes you’d rather try to sniff out some DAG structure; sometimes, yet
again, it can pay to look at what’s right there in front of you. In this case, that would be the sequence.

When solving problems using DP, we use recursive decomposition or inductive thinking. You still need to show that an optimal or correct global solution depends on optimal or correct solutions to your subproblems (optimal substructure, or the principle of optimality). The main difference from divide and conquer is just that you’re allowed to have overlapping subproblems. You might even say that you should look for a decomposition with overlap, because eliminating that overlap (with memoization) is what will give you an efficient solution. In addition to the perspective of “recursive decomposition with overlap,” you can often see DP problems as sequential decision problems or as looking for special (for example, shortest or longest) paths in a DAG. These perspectives are all equivalent, but they can fit various problems differently.

#3 Sequence Comparison
Comparing sequencesfor similarity is a crucial problem in much of molecular biology and bioinformatics, where the sequences involved are generally DNA, RNA, or protein sequences.

There are several ways of comparing sequences, many of which are more similar than one might think. For example, consider the problem of finding the longest common subsequence (LCS) between two sequences and finding the edit distance between them. The LCS problem is similar to the longest increasing subsequence problem - except that we’re no longer looking for increasing subsequence. We’re looking for subsequences that also occur in a second sequence. (For example, the LCS of Starwalker 12 and Starbuck is Stark.) The edit distance (also known as Levenshtein distance) is the minimum number of editing operations (insertions, deletions, or replacements) needed to turn one sequence into another. (For example, the edit distance between enterprise and deuteroprism is 4.) If we disallow replacements, the two are actually equivalent. The longest common subsequence is the part that stays the same when editing one sequence into the other with as few edits as possible. Every other character in either sequence must be inserted or deleted. Thus, if the length of the sequences are m and n and the length of the longest common subsequence is k, the edit distance without replacements is m+n-2k.

As for all DP problems, the key is to design a set of subproblems that we can relate to each other (that is, a recursive decomposition with tangled dependencies). It can often help to think of the set of subproblems as being parametrized by a set of indexes or the like. These will then be our induction variables. In this case, we can work with prefixes of the sequences (just like we worked with prefixes of a single sequence in the longest
increasing subsequence problem). Any pair of prefixes (identified by their lengths) gives rise to a subproblem, and we want to relate them in a subproblem graph (that is, a dependency DAG).

Let’s say our sequences are a and b. As with inductive thinking in general, we start with two arbitrary prefixes, identified by their lengths i and j. What we need to do is relate the solution to this problem to some other problems, where at least one of the prefixes is smaller. Intuitively, we’d like to temporarily chop off some elements from the end of either sequence, solve the resulting problem by our inductive hypothesis, and stick the elements back on. If we stick with weak induction (reduction by one) along either sequence, we get three cases: Chop the last element from a, from b, or from both. If we remove an element from just one sequence, it’s excluded from the LCS. If we drop the last from both, however, what happens depends on whether the two elements are equal or not. If they are, we can use them to extend the LCS by one! (If not, they’re of no use to us.)
                  0                                           if i = 0 or j = 0
L ( i , j ) = 1 + L(i-1, j -1)                   if a = b[j]
                  max {L(i-1, j), L(i, j-1)}    otherwise

This recursive decomposition can easily be seen as a dynamic decision process (do we chop off an element from the first sequence, from the second, or from both?), which can be represented as a DAG (see Figure 8-5). We start in the node represented by the full sequences, and we try to find the longest path back to the node representing two empty prefixes. It’s important to be clear about what the “longest path” is here, though—that is, what the edge weights are. The only time we can extend the LCS (which is our goal) is when we chop off two identical elements, represented by the DAG edges that are diagonal when the nodes are placed in a grid, as in Figure 8-5. These edges, then, have a weight of one, while the other edges have a weight of zero.

#4 The Knapsack problem
The knapsack problem is framed in different terms: We have a set of items that we want to take with us, each with a certain weight and value; however, our knapsack has a maximum capacity (an upper bound on the total weight), and we want to maximize the total value we get

the knapsack problem involves a set of objects, each of which has a weight and a value, and the knapsack has a capacity.

stuff the knapsack with objects such that (1) the total weight is less than or equal to the capacity, and (2) the total value is maximized. Let’s say that object i has weight w and value v.

There are two important cases of the integer knapsack problem - the bounded and unbounded cases.

The bounded case assumes we have a fixed number of objects in each category
The unbounded case lets us use as many as we want.

This problem fits the pattern: need to somehow define the subproblems, relate them to each other recursively, and then make sure we compute each subproblem only once (by using memoization, implicitly or explicitly). The “unboundedness” of the problem means that it’s a bit hard to restrict the objects we can use, using the common “in or out” idea (although we’ll use that in the bounded version). Instead, we can simply parametrize our subproblems using - that is, use induction over - the knapsack capacity.

If we say that m(r) is the maximum value we can get with a (remaining) capacity r, each value of r gives us a subproblem. The recursive decomposition is based on either using or not using the last unit of the capacity. If we don’t use it, we have m(r) = m(r-1). If we do use it, we have to choose the right object to use. If we choose object i (provided it will fit in the remaining capacity), we would have m(r) = v + m(r-w), because we’d add the value of i, but we’d also have used up a portion of the remaining capacity equal to its weight.

We can (once again) think of this as a decision process: We can choose whether to use the last capacity unit, and if we do use it, we can choose which object to add. Because we can choose any way we want, we simply take the maximum over all possibilities.

Summary
dynamic programming, or DP, which is used when the subproblem dependencies get tangled (that is, we have overlapping subproblems) and a straight divide-and-conquer solution would give an exponential running time. The term dynamic programming was originally applied to a class of sequential decision problems but is now used primarily about the solution technique, where some form of caching is performed, so that each subproblem need be computed only once. One way of implementing this is to add caching directly to a recursive function that embodies the recursive decomposition (that is, the induction step) of the algorithm design; this is called memoization. It can often be useful to invert the memoized recursive implementations, though, turning them into iterative ones.