如何删除重复的数据以进行进一步检验

5029

收藏 2010-07-23

请教一下：事件研究法中，用以下命令生成累计超额收益率后，每个id的各行数据均相同（即累计超额收益率重复出现），此时如何只留下一个累计超额收益率的数据以进行T检验等？谢谢！
gen abnormal_return=ret-predicted_return if event_window==1
by id: egen car = sum(abnormal_return) if dif>=-365 & dif<0

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

t818

2010-7-23 03:06:39

能否在用egen时针对每个id直接只输出一个累计超额收益率？谢谢！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

houquan

2010-7-23 08:53:52

help collapse                                                                                                                         dialog:  collapse
---------------------------------------------------------------------------------------------------------------------------------------------------------

Title

[D] collapse -- Make dataset of summary statistics

Syntax

      collapse clist [if] [in] [weight] [, options]

where clist is either

      [(stat)] varlist [ [(stat)] ... ]

      [(stat)] target_var=varname [target_var=varname ...] [ [(stat)] ...]

or any combination of the varlist or target_var forms, and stat is one of

      mean       means (default)
      median    medians
      p1          1st percentile
      p2          2nd percentile
      ...       3rd-49th percentiles
      p50       50th percentile (same as median)
      ...       51st-97th percentiles
      p98       98th percentile
      p99       99th percentile
      sd          standard deviations
      semean    standard error of the mean (sd/sqrt(n))
      sebinomial standard error of the mean, binomial (sqrt(p(1-p)/n))
      sepoisson standard error of the mean, Poisson (sqrt(mean))
      sum       sums
      rawsum    sums, ignoring optionally specified weight
      count       number of nonmissing observations
      max       maximums
      min       minimums
      iqr       interquartile range
      first       first value
      last       last value
      firstnm    first nonmissing value
      lastnm    last nonmissing value

If stat is not specified, mean is assumed.

options       description
---------------------------------------------------------------------------------------------------------------------------------------------------
Options
   by(varlist) groups over which stat is to be calculated
   cw          casewise deletion instead of all possible observations

+ fast          do not restore the original dataset should the user press Break; programmer's command
---------------------------------------------------------------------------------------------------------------------------------------------------
+ fast is not shown in the dialog box.
varlist and varname in clist may contain time-series operators; see tsvarlist.
aweights, fweights, iweights, and pweights are allowed; see weight, and see Weights below.  pweights may not be used with sd, semean, sebinomial,
   or sepoisson.  iweights may not be used with semean, sebinomial, or sepoisson.  aweights may not be used with sebinomial or sepoisson.

Menu

Data > Create or change data > Other variable-transformation commands > Make dataset of means, medians, etc.

Description

collapse converts the dataset in memory into a dataset of means, sums, medians, etc.  clist must refer to numeric variables exclusively.

Note: See [D] contract if you want to collapse to a dataset of frequencies.

Options

      +---------+
----+ Options +------------------------------------------------------------------------------------------------------------------------------------

by(varlist) specifies the groups over which the means, etc., are to be calculated.  If this option is not specified, the resulting dataset will
      contain 1 observation.  If it is specified, varlist may refer to either string or numeric variables.

cw specifies casewise deletion.  If cw is not specified, all possible observations are used for each calculated statistic.

The following option is available with collapse but is not shown in the dialog box:

fast specifies that collapse not restore the original dataset should the user press Break.  fast is intended for use by programmers.

Weights

collapse allows all four weight types; the default is aweights.  Weight normalization impacts only the sum, count, sd, semean, and sebinomial
statistics.

Here are the definitions for count and sum with weights:

   count:
      unweighted                   _N, the number of physical observations
      aweight:                   _N, the number of physical observations
      fweight, iweight, pweight: sum(w_j), the sum of user-specified weights
   sum:
      unweighted                   sum(x_j), the sum of the variable
      aweight:                   sum(v_j*x_j); v_j = weights normalized to sum to _N
      fweight, iweight, pweight: sum(w_j*x_j); w_j = user supplied weights.

The sd statistic with weights returns the bias-corrected standard deviation, which is based on the factor sqrt(N/(N-1)), where N is the number of
observations. Statistics sd, semean, sebinomial, and sepoisson are not allowed with pweighted data.  Otherwise, the statistic is changed by the
weights through the computation of the count (N), as outlined above.

For instance, consider a case in which there are 25 physical observations in the dataset and a weighting variable that sums to 57.  In the
unweighted case, the weight is not specified, and N = 25.  In the analytically weighted case, N is still 25; the scale of the weight is irrelevant.
In the frequency-weighted case, however, N = 57, the sum of the weights.

The rawsum statistic with aweights ignores the weight, with one exception:  observations with zero weight will not be included in the sum.

Examples

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college
      . describe
      . list

Create dataset containing the 25th percentile of gpa for each year
      . collapse (p25) gpa [fw=number], by(year)

List the result
      . list

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college, clear

Create dataset containing the mean and median of gpa and hour for each year, and store median of gpa and hour in medgpa and medhour, respectively
      . collapse (mean) gpa hour (median) medgpa=gpa medhour=hour [fw=number], by(year)

List the result
      . list

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college, clear

Create dataset containing the count of gpa and hour and the minimums of gpa and hour, and store the minimums in mingpa and minhour, respectively
      . collapse (count) gpa hour (min) mingpa=gpa minhour=hour [fw=number], by(year)

List the result
      . list

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college, clear
      . replace gpa = . in 2/4

Create dataset containing the mean of gpa and hour for each year, but ignore all observations that have missing values when calculating the means
      . collapse (mean) gpa hour [fw=number], by(year) cw

List the result
      . list
-----------------------------------------------------------------------------------------------------------------------------------------------------

Also see

Manual:  [D] collapse

   Help:  [D] contract, [D] egen, [D] statsby, [R] summarize

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

ajun685

2010-7-23 11:00:30

by id, sort: drop if car == car[_n+1]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

t818

2010-7-23 13:37:32

谢谢！
发现用这个就不必剔除样本，这样以便于进一步计算不同窗口期的收益率：
by car,sort: replace car =. if car== car[_n+1]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

dandan36956

2011-6-2 20:44:44

楼主你好，我最近也要用事件研究法，可是因为没找到系统的文献或者书籍，我本身对事件研究法也是第一次接触，所以能不能请你给我一些指点，比如说，能不能给我介绍点文章或者资料。万分感谢！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群