全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 SAS专版
8532 8
2012-09-05
悬赏 10 个论坛币 已解决
如下面一篇英文文章:
It is Teacher's Day today. On this special occasion I would like to extend my heartfelt congratulations to all teachers, "Happy Teacher's Day!"    Of all teachers who have taught me since my early childhood, the most unforgettable one is my first English teacher in college, Ms. Zhang. It is she who has aroused my keen interest in the learning of English and helped me realize the importance of self-reliance. Born into a poor farmer's family in a mountainous area and educated in relatively primitive surroundings, I found myself lagging far behind in the first class in college, which happened to be Ms. Zhang's English class. I was really discouraged and frustrated, so I decided to drop out. Ms. Zhang was so keenly insightful that she had noticed my embarrassment in class. After class, she called me into the Teacher's Room and discussed the situation with me, earnestly and kindly, citing the example of Robinson Crusoe to motivate me to go ahead in spite of all kinds of difficulties. "Be a man and rely on yourself," she nudged me. The next time we met, she brought me a simplified version of Robinson Crusoe and recommended that I finish reading it in a week and write a book report. Under her consistent and patient guidance, not only has my English been greatly improved, but my confidence and courage enhanced considerably.    "Rely on yourself and be a man," Ms. Zhang's inspiring words have been echoing in my mind. I will work harder and try my utmost to lay a solid foundation for my future career. Only by so doing can I repay Ms. Zhang's kindness and live up to her expectations of me, that is, to become a useful person and contribute to society.


其中有各种字符,如果想统计26个英文字母(不考虑大小写)在这篇文章中出现的次数和频率,sas是否有能力解决这个问题?如果可以,需要怎么做,如果高手愿意写程序,感激不尽,本人论坛币较少,就只悬赏10个了。主要是交流学习
第二个问题:能否统计在这里面出现的全部单词有哪些,每个单词出现了多少次?

最佳答案

ziyenano 查看完整内容

data ex; infile "e:\x.txt" delimiter='@' lrecl=2000; input x:$5000.; run; data ex1; set ex; array char(4) $ _temporary_ ('a','b','c','d'); do i=1 to dim(char); name=char(i); count=count(lowcase(x),compress(char(i))); output; end; drop x i; run; data ex2; infile "e:\x.txt" lrecl=2000; input x:$20. @@; x=compress(lowcase(x),'','p'); run; proc sql; create table ex3 as ...
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2012-9-5 12:40:44
data ex;
infile "e:\x.txt" delimiter='@' lrecl=2000;
input x:$5000.;
run;

data ex1;
set ex;
array char(4) $ _temporary_  ('a','b','c','d');
do i=1 to dim(char);
name=char(i);
count=count(lowcase(x),compress(char(i)));
output;
end;
drop x i;
run;


data ex2;
infile "e:\x.txt"  lrecl=2000;
input x:$20. @@;
x=compress(lowcase(x),'','p');
run;

proc sql;
create table ex3 as
select x,count(*) from ex2 group by x;
quit;
大体思想就是这样,有些细节可能还要加工一下

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-9-5 16:40:42
ziyenano 发表于 2012-9-5 15:26
data ex;
infile "e:\x.txt" delimiter='@' lrecl=2000;
input x:$5000.;
高手,infile中的delimiter="@''是什么意思,貌似是强制读取下一行数据。可是如果数据弄到cards里面就不能用了,为什么,还有,即便是在txt里面,如果将数据变成多个段落,也不能保证将全部数据导入一个观测中,望高手指点
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-9-5 16:50:32
ziyenano 发表于 2012-9-5 15:26
data ex;
infile "e:\x.txt" delimiter='@' lrecl=2000;
input x:$5000.;
另,高手,删除标点符号有没有办法只删除字符串开始和结尾的,而不删除中间的。
如student's 这个中间的标点符号不要删除
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-9-5 17:01:33
复制代码

自己随便想了想,实现的。程序生成了26个数据集,这是非常没必要的,但懒得去改了!
看了几个回复,发现还有很多可以优化,发现很多sas的函数根本就不知道。悲剧!

QQ截图20120905165411.jpg

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2012-9-5 17:02:11
txt中数据是一行,以@为分隔符,将数据读到一个观测中;cards中应该是可以的,分隔符设置为@或者是其他的符号,只要是文本中没有的字符;
如果在txt中是多段的话,要读到一个观测中,可以用行指针。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群