面板数据中如何按照证券代码和年份2个变量合并数据

8667

收藏 2012-03-07

面板数据中如何按照证券代码和年份2个变量合并数据

例如
stock year a          与             stock year b
1       03    2                         1       03    4
1       04    4                         1       04    6
合并成
stock year a b
1          03 2 4
1          04 4 6

亟需帮助，谢谢。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

南方郎

2012-3-7 10:12:03

merge m:1 varlist using filename [, options]

这里的m代表什么含义？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

sungmoo

2012-3-7 10:27:06

多对1

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

蓝色

2012-3-7 11:57:13

merge的help文件，英文还有应该能明白吧
Title

[D] merge -- Merge datasets

Syntax

One-to-one merge on specified key variables

      merge 1:1 varlist using filename [, options]

Many-to-one merge on specified key variables

      merge m:1 varlist using filename [, options]

One-to-many merge on specified key variables

      merge 1:m varlist using filename [, options]

Many-to-many merge on specified key variables

      merge m:m varlist using filename [, options]

One-to-one merge by observation

      merge 1:1 _n using filename [, options]

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

南方郎

2012-3-7 12:54:23

1:1或者1:m或者m:m代表什么含义，不是很理解？

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

swufe2012

2012-3-7 13:08:38

merge stock year using"文件所在位置"就可以了，你可以help merge上面有介绍的

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

点击查看更多内容…

蓝色

2012-3-7 13:29:56

Basic description

Think of merge as being master + using = merged result.

Call the dataset in memory the master dataset, and the dataset on disk the using dataset.  This way
we have general names that are not dependent on individual datasets.

Suppose we have two datasets,

   master in memory    on disk in file filename
      +-------+                +--------+
      |id  age|                |id wgt|
      |-------|                |--------|
      | 1 22|                | 1 130|
      | 2 56|                | 2 180|
      | 5 17|                | 4 110|
      +-------+                +--------+

We would like to join together the age and weight information.  We notice that the id variable
identifies unique observations in both datasets: if you tell me the id number, then I can tell you
the one observation that contains information about that id.  This is true for both the master and
the using datasets.

Because id uniquely identifies observations in both datasets, this is a 1:1 merge. We can bring in
the dataset from disk by typing

      . merge 1:1 id using filename

      in memory    in filename.dta
      master    +    using    = merged result
      +-------+       +--------+       +------------+
      |id  age|       |id wgt|       |id  age  wgt|
      |-------|       |--------|       |------------|
      | 1 22|       | 1 130|       | 1 22  130|  (matched)
      | 2 56|       | 2 180|       | 2 56  180|  (matched)
      | 5 17|       | 4 110|       | 5 17 .|  (master only)
      +-------+       +--------+       | 4 .  110|  (using only)
                                          +------------+


The original data in memory are called the master data.  The data in filename.dta are called the
using data.  After merge, the merged result is left in memory.  The id variable is called the key
variable.  Stata jargon is that the datasets were merged on id.

Observations for id==1 existed in both the master and using datasets and so were combined in the
merged result.  The same occurred for id==2.  For id==5 and id==4, however, no matches were found
and thus each became a separate observation in the merged result.  Thus each observation in the
merged result came from one of three possible sources:

         numeric equivalent
         code    word          description
         ------------------------------------------------------------
            1    master       originally appeared in master only
            2    using       originally appeared in using only
            3    match       originally appeared in both
         ------------------------------------------------------------

merge encodes this information into new variable _merge, which merge adds to the merged result:

      in memory       in filename.dta
      master    +    using    =       merged result
      +-------+       +--------+       +--------------------+
      |id  age|       |id wgt|       |id  age  wgt  _merge|
      |-------|       |--------|       |--------------------|
      | 1 22|       | 1 130|       | 1 22  130    3|
      | 2 56|       | 2 180|       | 2 56  180    3|
      | 5 17|       | 4 110|       | 5 17 .    1|
      +-------+       +--------+       | 4 .  110    2|
                                          +--------------------+

Note: Above we show the master and using data sorted by id before merging; this was for
illustrative purposes.  The dataset resulting from a 1:1 merge will have the same data, regardless
of the sort order of the master and using datasets.

The formal definition for merge behavior is the following:  Start with the first observation of the
master.  Find the corresponding observation in the using data, if there is one.  Record the matched
or unmatched result.  Proceed to the next observation in the master dataset.  When you finish
working through the master dataset, work through unused observations from the using data.  By
default, unmatched observations are kept in the merged data, whether they come from the master
dataset or the using dataset.

Remember this formal definition.  It will serve you well.

1:1 merges

The example shown above is called a 1:1 merge, because the key variable uniquely identified each
observation in each of the datasets.

A variable or variable list uniquely identifies the observations if each distinct value of the
variable(s) corresponds to one observation in the dataset.

In some datasets, multiple variables are required to identify the observations.  Imagine data
obtained by observing patients at specific points in time so that variables pid and time, taken
together, identify the observations.  Below we have two such datasets and run a 1:1 merge on pid
and time,

      . merge 1:1 pid time using filename

         master    +    using       =       merged result

      +-------------+    +-------------+    +-------------------------+
      |pid  time  x1|    |pid  time  x2|    |pid  time  x1  x2  _merge|
      |-------------|    |-------------|    |-------------------------|
      | 14    1 0|    | 14    1 7|    | 14    1 0 7    3|
      | 14    2 0|    | 14    2 9|    | 14    2 0 9    3|
      | 14    4 0|    | 16    1 2|    | 14    4 0 .    1|
      | 16    1 1|    | 16    2 3|    | 16    1 1 2    3|
      | 16    2 1|    | 17    1 5|    | 16    2 1 3    3|
      | 17    1 0|    | 17    2 2|    | 17    1 0 5    3|
      +-------------+    +-------------+    | 17    2 . 2    2|
                                             +-------------------------+

This is a 1:1 merge because the combination of the values of pid and time uniquely identifies
observations in both datasets.

By default, there is nothing about a 1:1 merge that implies that all, or even any of, the
observations match.  Above five observations matched, one observation was only in the master
(subject 14 at time 4), and another was only in the using (subject 17 at time 2).

m:1 merges

In an m:1 merge, the key variable or variables uniquely identify the observations in the using
data, but not necessarily in the master data.  Suppose you had person-level data within regions and
you wished to bring in regional data.  Here is an example:

   . merge m:1 region using filename

         master    +    using    =          merged result

   +--------------+ +----------+    +--------------------------+
   |id  region a| |region x|    |id  region a x  _merge|
   |--------------| |----------|    |--------------------------|
   | 1    2  26| |    1  15|    | 1    2  26  13    3|
   | 2    1  29| |    2  13|    | 2    1  29  15    3|
   | 3    2  22| |    3  12|    | 3    2  22  13    3|
   | 4    3  21| |    4  11|    | 4    3  21  12    3|
   | 5    1  24| +----------+    | 5    1  24  15    3|
   | 6    5  20|                      | 6    5  20 .    1|
   +--------------+                      | .    4 .  11    2|
                                          +--------------------------+

To bring in the regional information, we need to merge on region.  The values of region identify
individual observations in the using data, but it is not an identifier in the master data.

We show the merged dataset sorted by id because this makes it easier to see how the merged dataset
was constructed.  For each observation in the master data, merge finds the corresponding
observation in the using data.  merge combines the values of the variables in the using dataset to
the observations in the master dataset.

1:m merges

1:m merges are similar to m:1, except that now the key variables identify unique observations in
the master dataset.  Any datasets that can be merged using an m:1 merge may be merged using a 1:m
merge by reversing the roles of the master and using datasets.  Here is the same example as used
previously, with the master and using datasets reversed:

   . merge 1:m region using filename

      master    +    using       =       merged result

   +----------+    +--------------+ +--------------------------+
   |region x|    |id  region a| |region x  id a  _merge|
   |----------|    |--------------| |--------------------------|
   |    1  15|    | 1    2  26| |    1  15 2  29    3|
   |    2  13|    | 2    1  29| |    1  15 5  24    3|
   |    3  12|    | 3    2  22| |    2  13 1  26    3|
   |    4  11|    | 4    3  21| |    2  13 3  22    3|
   +----------+    | 5    1  24| |    3  12 4  21    3|
                     | 6    5  20| |    4  11 . .    1|
                     +--------------+ |    5 . 6  20    2|
                                          +--------------------------+

This merged result is identical to the merged result in the previous section, except for the sort
order and the contents of _merge.  This time, we show the merged result sorted by region rather
than id.  Reversing the roles of the files causes a reversal in the 1s and 2s for _merge: where
_merge was previously 1, it is now 2, and vice versa.  These exchanged _merge values reflect the
reversed roles of the master and using data.

For each observation in the master data, merge found the corresponding observation(s) in the using
data and then wrote down the matched or unmatched result.  Once the master observations were
exhausted, merge wrote down any observations from the using data that were never used.

m:m merges

m:m specifies a many-to-many merge and is a bad idea.  In an m:m merge, observations are matched
within equal values of the key variable(s), with the first observation being matched to the first;
the second, to the second; and so on.  If the master and using have an unequal number of
observations within the group, then the last observation of the shorter group is used repeatedly to
match with subsequent observations of the longer group.  Thus m:m merges are dependent on the
current sort order -- something which should never happen.

Because m:m merges are such a bad idea, we are not going to show you an example.  If you think that
you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or
m:1 merge. Tips for this are given in Troubleshooting m:m merges below.

Sequential merges

In a sequential merge, there are no key variables.  Observations are matched solely on their
observation number:

      . merge 1:1 _n using filename

         master +    using =    merged result

         +--+       +--+       +----------------+
         |x1|       |x2|       |x1 x2 _merge|
         |--|       |--|       |----------------|
         |10|       | 7|       |10 7       3|
         |30|       | 2|       |30 2       3|
         |20|       | 1|       |20 1       3|
         | 5|       | 9|       | 5 9       3|
         +--+       | 3|       | . 3       2|
                        +--+       +----------------+

In the example above, the using data are longer than the master, but that could be reversed.  In
most cases where sequential merges are appropriate, the datasets are expected to be of equal
length, and you should type

      . merge 1:1 _n using filename, assert(match) nogen

Sequential merges, like m:m merges, are dangerous. Both depend on the current sort order of the
data.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

蓝色

2012-3-7 13:31:39

难道help里的例子也看不懂吗

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

蓝色

2012-3-7 13:33:21

manual里面应该更加详细

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

南方郎

2012-3-7 14:28:06

蓝色发表于 2012-3-7 13:33
manual里面应该更加详细

好的，非常感谢，刚才电脑问题，没有显示刚才上面的详细描述，真是太感谢了。

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群