全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 R语言论坛
1150 3
2016-10-08
Creating Sample Datasets – Exercises
如何列出随机数据————

Creating sample data is a common task performed in many different scenarios.

R has several base functions that make the sampling process quite easy and fast.

Below is an explanation of the main functions used in the current set of exercices:

1. set.seed() – Although R executes a random mechanism of sample creation, set.seed() function allows us to reproduce the exact sample each time we execute a random-related function.

2. sample() – Sampling function. The arguments of the function are:
x – a vector of values,
size – sample size
replace – Either use a chosen value more than once or not
prob – the probabilities of each value in the input vector.

3. seq()/seq.Date() – Create a sequence of values/dates, ranging from a ‘start’ to an ‘end’ value.

4. rep() – Repeat a value/vector n times.

5. rev() – Revert the values within a vector.

You can get additional explanations for those functions by adding a ‘?’ prior to each function’s name.

Answers to the exercises are available here.
If you have different solutions, feel free to post them.

Exercise 1
1. Set seed with value 1235
2. Create a Bernoulli sample of 100 ‘fair coin’ flippings.
Populate a variable called fair_coin with the sample results.

Exercise 2
1. Set seed with value 2312
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values.
Populate a variable called hourselect1 with the sample results

Exercise 3
1. Create a vector variable called probs with the following probabilities:
‘0.05,0.08,0.16,0.17,0.18,0.14,0.08,0.06,0.03,0.03,0.01,0.01’
2. Make sure the sum of the vector equals 1.

Exercise 4
1. Set seed with value 1976
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values and use the probabilities defined in the previous question.
Populate a variable called hourselect2 with the sample results

Exercise 5
Let’s prepare the variables for a biased coin:
1. Populate a variable called coin with 5 zeros in a row and 5 ones in a row
2. Populate a variable called probs having 5 times value ‘0.08’ in a row and 5 times value ‘0.12’ in a row.
3. Make sure the sum of probabilities on probs variable equals 1.

Exercise 6
1. Set seed with value 345124
2. Create a biased sample of length 100, having as input the coin vector, and as probabilities probs vector of probabilities.
Populate a variable called biased_coin with the sample results.

Exercise 7
Compare the sum of values in fair_coin and biased_coin

Exercise 8
1. Create a ‘Date’ variable called startDate with value 9th of February 2010 and a second ‘Date’ variable called endDate with value 9th of February 2005
2. Create a descending sequence of dates having all 9th’s of the month between those two dates. Populate a variable called seqDates with the sequence of dates.

Exercise 9
Revert the sequence of dates created in the previous question, so they are in ascending order and place them in a variable called RevSeqDates

Exercise 10
1. Set seed with value 10
2. Create a sample of 20 unique values from the RevSeqDates vector.


答案


Below are the solutions to these exercises on creating a sample dataset.

#####################                  ##    Exercise 1    ##                  #####################set.seed(1235)fair_coin <- sample(c(0,1), 100, replace = TRUE)#####################                  ##    Exercise 2    ##                  #####################set.seed(2312)hourselect1 <- sample(c(8:19),10,replace=TRUE)hourselect1
##  [1] 14 19 16 18 13 15  8 10 10 16
#####################                  ##    Exercise 3    ##                  #####################probs <- c(0.05,0.08,0.16,0.17,0.18,0.14,0.08,0.06,0.03,0.03,0.01,0.01)sum(probs)
## [1] 1
#####################                  ##    Exercise 4    ##                  #####################set.seed(1976)hourselect2 <- sample(c(8:19),10,replace=TRUE,prob = probs)hourselect2
##  [1] 15 11 12 15 12  9 14 12 10  9
#####################                  ##    Exercise 5    ##                  #####################coin <- rep(c(0,1),each=5)coin
##  [1] 0 0 0 0 0 1 1 1 1 1
probs <- rep(c(0.08,0.12),each=5)sum(probs)
## [1] 1
#####################                  ##    Exercise 6    ##                  #####################set.seed(345124)biased_coin <- sample(coin, 100, replace = TRUE,prob=probs)#####################                  ##    Exercise 7    ##                  #####################sum(fair_coin)
## [1] 52
sum(biased_coin)
## [1] 63
#####################                  ##    Exercise 8    ##                  #####################startDate <- as.Date("2010-02-09")endDate <- as.Date("2005-02-09")seqDates <- seq.Date(startDate, endDate, by = "-1 month")#####################                  ##    Exercise 9    ##                  #####################RevSeqDates <- rev(seqDates)#####################                  ##    Exercise 10   ##                  #####################set.seed(10)sample(RevSeqDates,20,replace=FALSE)
##  [1] "2007-08-09" "2006-08-09" "2007-03-09" "2008-06-09" "2005-06-09"##  [6] "2006-02-09" "2006-05-09" "2006-04-09" "2007-10-09" "2006-12-09"## [11] "2007-11-09" "2007-06-09" "2005-07-09" "2009-03-09" "2006-06-09"## [16] "2006-09-09" "2005-04-09" "2006-01-09" [size=0.85em]"2006-07-09" "2008-01-09"





二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2016-10-9 17:26:52
Copy自R-bloggers
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-10-9 23:48:17
stzhao 发表于 2016-10-9 17:26
Copy自R-bloggers
求blog链接
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2016-10-10 01:22:53
日新少年 发表于 2016-10-9 23:48
求blog链接
https://www.r-bloggers.com/creating-sample-datasets-exercises/

这是链接
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群