全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 MATLAB等数学软件专版
2824 0
2013-11-10
在matlab中将for循环改为parfor就可以实现并行运算,很简单也很高效,然而常常会出现parfor无法执行的错误,因为不满足循环体内部必须独立的原则等等原因,下面这篇blog介绍了常常出现的错误以及解决方案。

http://blogs.mathworks.com/loren/2009/10/02/using-parfor-loops-getting-up-and-running/
Using parfor Loops: Getting Up and Running

Today I’d like to introduce a guest blogger, Sarah Wait Zaranek, who is an application engineer here at The MathWorks. Sarah previously haswritten about speeding up code from a customer to get acceptable performance. She again will be writing about speeding up MATLAB applications, but this time her focus will be on using the parallel computing tools.

Contents
Introduction

I wanted to write a post to help users better understand our parallel computing tools. In this post, I will focus on one of the more commonly used functions in these tools: the parfor-loop.

This post will focus on getting a parallel code using parfor up and running. Performance will not be addressed in this post. I will assume that the reader has a basic knowledge of the parfor-loop construct. Loren has a very nice introduction to using parfor in one of her previous posts. There are also some nice introductory videos.

Note for clarity : Since Loren's introductory post, the toolbox used for parallel computing has changed names from the Distributed Computing Toolbox to the Parallel Computing Toolbox. These are not two separate toolboxes.

Method

In some cases, you may only need to change a for-loop to a parfor-loop to get their code running in parallel. However, in other cases you may need to slightly alter the code so that parfor can work. I decided to show a few examples highlighting the main challenges that one might encounter. I have separated these examples into four encompassing categories:

  • Independence
  • Globals and Transparency
  • Classification
  • Uniqueness

Background on parfor-loops

In a parfor-loop (just like in a standard for-loop) a series of statements known as the loop body are iterated over a range of values. However, when using a parfor-loop the iterations are run not on the client MATLAB machine but are run in parallel on MATLAB workers.

Each worker has its own unique workspace. So, the data needed to do these calculations is sent from the client to workers, and the results are sent back to the client and pieced together. The cool thing about parfor is this data transfer is handled for the user. When MATLAB gets to theparfor-loop, it statically analyzes the body of the parfor-loop and determines what information goes to which worker and what variables will be returning to the client MATLAB. Understanding this concept will become important when understanding why particular constraints are placed on the use of parfor.

Opening the matlabpool

Before looking at some examples, I will open up a matlabpool so I can run my loops in parallel. I will be opening up the matlabpool using my default local configuration (i.e. my workers will be running on the dual-core laptop machine where my MATLAB has been installed).

if matlabpool('size') == 0 % checking to see if my pool is already open matlabpool open 2endStarting matlabpool using the 'local' configuration ... connected to 2 labs.

Note : The 'size' option was new in R2008b.

Independence

The parfor-loop is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration. This is a critical requirement for using a parfor-loop. Let's see an example of when each iteration is not independent.

type dependentLoop.m% Example of a dependent for-loopa = zeros(1,10);parfor it = 1:10 a(it) = someFunction(a(it-1));end

Checking the above code using M-Lint (MATLAB's static code analyzer) gives a warning message that these iterations are dependent and will not work with the parfor construct. M-Lint can either be accessed via the editor or command line. In this case, I use the command line and have defined a simple function displayMlint so that the display is compact.

output = mlint('dependentLoop.m');displayMlint(output)The PARFOR loop cannot run due to the way variable 'a' is used. In a PARFOR loop, variable 'a' is indexed in different ways, potentially causing dependencies between iterations.

Sometimes loops are intrinsically or unavoidably dependent, and therefore parfor is not a good fit for that type of calculation. However, in some cases it is possible to reformulate the body of the loop to eliminate the dependency or separate it from the main time-consuming calculation.

Globals and Transparency

All variables within the body of a parfor-loop must be transparent. This means that all references to variables must occur in the text of the program. Since MATLAB is statically analyzing the loops to figure out what data goes to what worker and what data comes back, this seems like an understandable restriction.

Therefore, the following commands cannot be used within the body of a parfor-loop : evalc, eval, evalin, and assignin. load can also not be used unless the output of load is assigned to a variable name. It is possible to use the above functions within a function called by parfor, due to the fact that the function has its own workspace. I have found that this is often the easiest workaround for the transparency issue.

Additionally, you cannot define global variables or persistent variables within the body of the parfor loop. I would also suggest being careful with the use of globals since changes in global values on workers are not automatically reflected in local global values.

Classification

A detailed description of the classification of variables in a parfor-loop is in the documentation. I think it is useful to view classification as representing the different ways a variable is passed between client and worker and the different ways it is used within the body of the parfor-loop.

Challenges with Classification

Often challenges arise when first converting for-loops to parfor-loops due to issues with this classification. An often seen issue is the conversion of nested for-loops, where sliced variables are not indexed appropriately.

Sliced variables are variables where each worker is calculating on a different part of that variable. Therefore, sliced variables are sliced or divided amongst the workers. Sliced variables are used to prevent unneeded data transfer from client to worker.

Using parfor with Nested for-Loops

The loop below is nested and encounters some of the restrictions placed on parfor for sliced variables.

type parforNestTry.mA1 = zeros(10,10); parfor ix = 1:10 for jx = 1:10 A1(ix, jx) = ix + jx; endendoutput = mlint('parforNestTry.m');displayMlint(output);The PARFOR loop cannot run due to the way variable 'A1' is used. Valid indices for 'A1' are restricted in PARFOR loops.

In this case, A1 is a sliced variable. For sliced variables, the restrictions are placed on the first-level variable indices. This allows parfor to easily distribute the right part of the variable to the right workers.

The first level indexing ,in general, refers to indexing within the first set of parenthesis or braces. This is explained in more detail in the same section as classification in the documentation.

One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

In this case, A1 has an loop counter variable for both first level indices (ix and jx).

The solution to this is make sure a loop counter variable is only one of the indices of A1 and make the other index a colon. To implement this, the results of the inner loop can be saved to a new variable and then that variable can be saved to the desired variable outside the nested loop.

A2 = zeros(10,10);parfor ix = 1:10 myTemp = zeros(1,10); for jx = 1:10 myTemp(jx) = ix + jx; end A2(ix,:) = myTemp;end

You can also solve this issue by using cells. Since jx is now in the second level of indexing, it can be an loop counter variable.

A3 = cell(10,1);parfor ix = 1:10 for jx = 1:10 A3{ix}(jx) = ix + jx; endendA3 = cell2mat(A3);

I have found that both solutions have their benefits. While cells may be easier to implement in your code, they also result in A3 using more memory due to the additional memory requirements for cells. The call to cell2mat also adds additional processing time.

A similar technique can be used for several levels of nested for-loops.

Uniqueness

Doing Machine Specific Calculations

This is a way, while using parfor-loops, to determine which machine you are on and do machine specific instructions within the loop. An example of why you would want to do this is if different machines have data files in different directories, and you wanted to make sure to get into the right directory. Do be careful if you make the code machine-specific since it will be harder to port.

% Getting the machine host name[~,hostname] = system('hostname');% If the loop iterations are the same as the size of matlabpool, the% command is run once per worker.parfor ix = 1:matlabpool('size') [~,hostnameID{ix}] = system('hostname');end% Can then do host/machine specific commandshostnames = unique(hostnameID);checkhost = hostnames(1);parfor ix = 1:matlabpool('size') [~,myhost] = system('hostname'); if strcmp(myhost,checkhost) display('On Machine 1') else display('NOT on Machine 1') endendOn Machine 1On Machine 1

In my case since I am running locally -- all of the workers are on the same machine.

Here's the same code running on a non-local cluster.

matlabpool closematlabpool open speedyparfor ix = 1:matlabpool('size') [~,hostnameID{ix}] = system('hostname');end% Can then do host/machine specific commandshostnames = unique(hostnameID);checkhost = hostnames(1);parfor ix = 1:matlabpool('size') [~,myhost] = system('hostname'); if strcmp(myhost,checkhost) display('On Machine 1') else display('NOT on Machine 1') endendSending a stop signal to all the labs ... stopped.Starting matlabpool using the 'speedy' configuration ... connected to 16 labs.On Machine 1On Machine 1On Machine 1NOT on Machine 1On Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1NOT on Machine 1

Note: The ~ feature is new in R2009b and discussed as a new feature in one of Loren's previous blog posts.

Doing Worker Specific Calculations

I would suggest using the new spmd functionality to do worker specific calculations. For more information about spmd, check out the documentation.

Clean up

matlabpool close




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群