配对两个数据集过程中添加一个循环用以检验条件成立与否

1262

收藏 2012-06-20

我有一个子数据集，是这样的。其中house_id是唯一的，不重复的，来自数据集A。但是BestMatch是从数据集B中找出，会有重复出现，寻找BestMatch的算法就是在数据集B中寻找与house_id距离最近的一个ID。那么数据集B中的ID可能就会重复出现。

现在我需要有一个限定条件就是，如果数据集B中的数据ID出现了超过了3次，那么这个ID就不再被计算在下一次计算距离的数据之内了。要怎么完成这个算法呢？
下面这个是我寻找两个数据集间，最短距离的一个大致方法。要如何改善才能把这个限制条件加进去呢？？

Data RowLookUp(Keep = city StartRow EndRow) ;
Retain StartRow ;
Set B；
by city ;
if first.city then StartRow = _N_ ;
if last.city then do ;
      EndRow = _N_ ;
      Output ;
end ;
run ;

Data Match (Keep = House_id BestMatch BestDistance) ;
Array RQn(2:&NumQs) q2-q&NumQs ;
Array DQn(2:&NumQs) dq2-dq&NumQs ;
Retain BestMatch BestDistance ;
Merge A RowLookup ;
by city ;

do RowNum = StartRow to EndRow ;
      Set B Point=RowNum ;
      Distance = 0 ;
      do i = 2 to &NumQs ;
            Distance = Distance + ((RQn(i) - DQn(i))**2) ;
            if Distance ge BestDistance then do ;
                     if RowNum ne StartRow then i = &NumQs+1 ;
            end ;
      end ;
      if RowNum eq StartRow or Distance lt BestDistance then do ;
            BestDistance = Distance ;
            BestMatch = DHouse ;
      end ;
end ;
Output ;
run ;

求各位大神帮忙。。。