| fndnmefull | fndnme |
| 宝康灵活配置证券投资基金 | 华宝兴业宝康灵活 |
| 宝康消费品证券投资基金 | 华宝兴业消费品 |
| 宝康债券投资基金 | 华宝兴业宝康债券 |
| 宝盈策略增长股票型证券投资基金 | 宝盈策略增长 |
| 宝盈泛沿海区域增长股票证券投资基金 | 宝盈泛沿海增长 |
| 宝盈鸿利收益证券投资基金 | 宝盈鸿利收益 |
| 宝盈增强收益债券型证券投资基金 | 宝盈增强收益AB |
| 宝盈资源优选股票型证券投资基金 | 宝盈资源优选 |
sungmoo 发表于 2010-8-25 15:02*设B库中所有基金都能在A库中找到对应的简称,且A库中各简称均有相应代码。
use B,clear
g g=1
ren fndnmefull fndnme
append using A
sort fndnme g
replace fndcd=fndcd[_n-1] if indexnot(fndnme[_n-1],fndnme)==0
keep if g==1
drop g
*所给的B库中有些基金在A库中找不到对应的简称。
voodoo 发表于 2010-8-30 22:07
将我整合坛友sungmoo和ctx5518相关建议的解决方案贴出(看起来还真有点复杂,;-)),也希望高手批评指正。
// 整体思路:1. 找出匹配规则 → 2. Stata实现 → 3. 保存匹配/尚未匹配 → 4. 下一步循环…… → 最后的“手工”验证与处理
set more off
// 复制原始文件,以防止覆盖
copy A.dta _A.dta
copy B.dta _B.dta
capture program drop findrule
program findrule
use A, clear
merge 1:1 _n using B, nogen
br
end
capture program drop statamatch
program statamatch
args rule
// 2. Stata实现
use B, clear
quietly levelsof new, local(strnew)
tempfile b
save `b'
use A.dta, clear
gen new = ""
if `rule' == 1 { // strict rule
foreach x of local strnew {
replace new = "`x'" if strpos("`x'", fndnme)>0
}
}
else if `rule' == 2 { // slack rule
foreach x of local strnew {
replace new = "`x'" if indexnot(fndnme, "`x'")==0 & ///
strpos("`x'", substr(fndnme, -4, .))>0 & missing(new)
}
}
merge m:1 new using `b'
// 3. 保存匹配和尚未匹配
preserve
keep if _merge == 3
keep fndcd fndnme fndnmefull
append using matched
save matched, replace
restore, preserve
keep if _merge == 2
keep fndnmefull
sort fndnmefull
save B, replace
restore
keep if _merge == 1
keep fndcd fndnme
sort fndnme
save A, replace
// 4. 浏览
use matched, clear
br
end
copy _A.dta A.dta, replace
copy _B.dta B.dta, replace
* 0. 生成空白matched.dta
clear
save matched, replace emptyok
* 1. Loop #1
findrule
use B, clear
gen new = subinstr(fndnmefull, "证券投资基金", "", .) // 去除"证券投资基金"字样
save B, replace
statamatch 1 // strict rule
* 2. Loop #2
findrule
use B, clear
gen new = "基金" + fndnmefull
save B, replace
statamatch 1 // 匹配封闭式基金,strict rule
* 3. Loop #3
findrule
use A, clear
drop if strmatch(fndnme, "基金*") // 删除可能带来混淆的封闭式基金
save A, replace
use B, clear
gen new = subinstr(fndnmefull, "摩根士丹利华鑫", "大摩", .) // "摩根士丹利华鑫" -> "大摩"
replace new = subinstr(new, "宝康", "华宝兴业宝康", .)
replace new = subinstr(new, "德盛", "国联安", .)
replace new = subinstr(new, "普天", "鹏华普天", .)
replace new = subinstr(new, "华泰柏瑞", "友邦华泰", .)
save B, replace
statamatch 2 // slack rule
* 4. manual handling
use B, clear
ren fndnmefull fndnme
append using A, gen(A)
sort fndnme
br
// 以下有些基金代码得借助google或百度等搜索工具
replace fndcd = "050001.OF" if fndnme == "博时价值增长证券投资基金"
replace fndcd = "020008.OF" if fndnme == "国泰金鹿保本增值混合证券投资基金"
replace fndcd = "020006.OF" if fndnme == "国泰金象保本增值混合证券投资基金"
replace fndcd = "150001.OF" if fndnme == "国投瑞银瑞福分级股票型证券投资基金"
replace fndcd = "519011.OF" if fndnme == "海富通精选证券投资基金"
replace fndcd = "040002.OF" if fndnme == "华安MSCI中国A股指数增强型证券投资基金"
replace fndcd = "240012.OF" if fndnme == "华宝兴业增强收益债券型证券投资基金"
// ...... ......
keep if A == 0
keep fndnme fndcd
ren fndnme fndnmefull
append using matched
sort fndnmefull fndnme
duplicates tag fndnmefull, gen(tag)
br if tag
drop tag
bysort fndnmefull (fndnme): keep if _n == _N
duplicates tag fndcd, gen(tag)
br if tag
// 相应处理 ......
save matched, replace
use matched, clear
merge 1:1 fndnmefull using _B, nogen assert(matched)
// DONE!
扫码加好友,拉您进群



收藏
