全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 Stata专版
2536 2
2019-08-14
其不进入统计,sum与缺失值与有区别,代表什么含义?
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2019-8-15 07:35:33
查帮助 missing
manual里面应该更加详细

Title

    [U] 12.2.1 Missing values


Description

    Stata has 27 numeric missing values:

        ., the default, which is called the "system missing value" or sysmiss

    and

        .a, .b, .c, ..., .z, which are called the "extended missing values".

    Numeric missing values are represented by large positive values.  The ordering is

                    all nonmissing numbers < . < .a < .b < ... < .z

    Thus, the expression age > 60 is true if variable age is greater than 60 or
    missing.

    To exclude missing values, ask whether the value is less than ".".  For instance,

        . list if age > 60 & age < .

    To specify missing values, ask whether the value is greater than or equal to ".".
    For instance,

        . list if age >=.

    Stata has one string missing value, which is denoted by "" (blank).


Remarks

    More details concerning missing values and their treatment in Stata are provided
    under the following headings:

        Overview
        Expressions
        Operators
        Functions
        Matrices
        Useful commands
        Value labels
        Estimation commands
        Technical note:  checking if a value is missing


    Overview

    1.  Stata supports different types of numeric missing values that can be used to
        specify different reasons that a value is unknown.  The most frequently used
        missing value ., referred to as sysmiss, is nearly always generated by Stata
        when it cannot assign a specific value.  The 26 extended missing values .a,
        .b, ..., .z are available to users requiring more elaborate tracking of
        missing values.

        Empty strings are treated as missing values of type string.

    2.  Numeric missing values are represented by large positive values.  This means
        that an expression such as income > 100 evaluates to true for missing values
        of the variable income, as well as to those that are greater than 100.  Also,
        the simple expression if varname evaluates to true for all nonzero values of
        varname, including missing values.

    3.  The ordering of missing values is

                    all nonmissing numbers < . < .a < .b < ... < .z

    4.  Most Stata statistical commands deal with missing values by disregarding
        observations with one or more missing values (called "listwise deletion" or
        "complete cases only").


    Expressions

    Expressions occur in many places in Stata (see [P] syntax and exp).  For example,

        . generate newvarname = exp

    evaluates the expression exp for each observation of the variable newvarname.
    Observations of newvarname are set to missing if exp evaluates to missing.

    Expressions are also used to restrict a command's operation to a subset of the
    observations.  For instance,

        . summarize varname if exp

    summarizes varname by using all observations for which exp evaluates to true (not
    zero), including observations that are missing.


    Operators

    The relational operators (see operators) interpret missing values as large
    positive numbers (see above). All the following thus evaluate to true

                73 < .        . == .        .a == .a
                .a != .       .a < .b       .a <= .b

    whereas all the following evaluate to false

                73 >= .       . == .a       . > .a

    The numerical operators (+ etc) return missing if any of their arguments are
    missing.


    Functions

    Stata has a few special functions for dealing with missing values:

        missing()        returns 1 (meaning true) if any of its arguments, numeric or
                         string, evaluates to missing and 0 (meaning false) otherwise.

        mi()             is a shorthand for missing().

        matmissing(K)    returns 1 (meaning true) if any elements of the matrix K are
                         missing and 0 (meaning false) otherwise.

    Some Stata functions interpret . in a special way.  For instance, the function
    inrange(x,a,b) returns 1 if x belongs in the interval [a,b].  This function
    interprets a==. as -infinity and b==. as +infinity.  These special interpretations
    are discussed in functions.

    Other Stata functions return missing (.) if one or more of the arguments are
    missing or invalid.


    Matrices

    Matrices may contain all types of missing values.  The matrix operators (see
    matrix operators)

                -     negate
                '     transpose

                \     row join
                ,     column join
                +     add
                -     subtract
                *     multiply (including multiply by scalar)
                /     division by scalar
                #     Kronecker product

    generate missing values elementwise.

    In the matrix product C=A*B, C[i,j] is missing if row i of A or column j of B
    contain a missing value.

    Matrix division by scalar C=A/b is not allowed if the scalar b is a missing value.
    Otherwise, missing values in matrix A generate missing values in C elementwise.

    Like the list command, the matrix list command has a nodotz option to display
    extended missing value .z as a blank string rather than as ".z".


    Useful commands

    ----------------------------------------------------------------------------------
    mvencode            changes missing values into numeric values
    mvdecode            changes numeric values into missing values
    codebook            provides extensive information about variables, including the
                          occurrence of simple and extended missing values
    misstable           tabulates missing values
    egen, rownonmiss()  number of valid observations in a varlist
    egen, rowmiss()     number of missing values in a varlist
    recode              recodes a variable, optionally into a new variable, with
                          special facilities to recode missing values.
    mi                  multiple imputation of missing values
    xtdescribe          describes participation patterns in panel data
    ----------------------------------------------------------------------------------


    Value labels

    It is possible to define value labels for the extended missing values .a to .z,
    but not for sysmiss ..  These value labels show up in the same way as value labels
    for nonmissing values.  See [D] label.


    Estimation commands

    Most Stata commands ignore observations that are missing in one or more of the
    variables referred to in the command.  For instance, the regression command
    regress disregards all observations that have a missing value for the dependent
    variable or missing values for any of the independent variables.  This method is
    known as "listwise deletion", "complete cases only", etc.  It is statistically
    appropriate only if the missing values are "at random".  In an if or weight
    expression to a command, the expressions will be evaluated, and the missing values
    will be processed using the operators and function() logic.

    Stata commands that can treat multiple observations as being related to one
    observational unit (for example, observations from a panel in xt models, episodes
    in st models) ignore specific observations from the "group", namely, those that
    have missing values.


    Technical note:  checking if a value is missing

    You might think you can test whether an expression or variable exp is missing with
    the expression exp==..  Remember, however, that Stata has 27 different missing
    values (., a, b, ..., z).

    exp==. means that the expression exp equals a specific missing value, namely,
    sysmiss ..  exp==. returns false if exp equals one of the extended missing-value
    types such as .a or .z.  To test whether exp is missing, that is, equals either .
    or one of the extended missing values, one should use the expression

        exp >= .
    or
        missing(exp)

    which can be abbreviated to

        mi(exp)

    To test whether exp is missing, use one of the following forms:

        exp < .
        !missing(exp)
        !mi(exp)

    An advantage of the last two forms is that the missing functions missing() and
    mi() allow multiple (numeric or string) arguments to test whether any of the
    argument is missing.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2024-7-5 18:56:46
在统计软件如SPSS中,`.r` 和 `.d` 是特殊的系统缺失值代码。它们并不参与数据的计算和分析,并且通常被视为非有效或未定义的数据点。

- `.r` 通常表示"规则缺失值"(regular missing value),这可能是由于数据收集过程中的遗漏或无法回答而产生的。
  
- `.d` 表示"特殊缺失值"(system-defined missing value), 这种类型可能用于标记某种特殊的状况,比如在数据处理过程中被系统定义为特定条件下的缺失。

两者与普通的缺失值(如NA或者空白)的区别在于它们是系统定义的,并且可能携带某些特定的信息。当进行数据分析时,这些特殊缺失值不会被包括在内,除非分析师特别指定将它们作为有效或无效的数据点来处理。

数值型变量中的`.r`和`.d`与普通的缺失值不同之处在于:通常,普通的缺失值不含有额外信息,仅仅表示数据未收集或者未知。而`.r`和`.d`这样的特殊系统缺失值可能被用来标记特定的异常情况或数据状态,它们在数据分析中需要特别注意处理。

例如,在某些情况下,你可能会遇到一个数值型变量,其中一些值被标记为 `.r` 或者 `.d` 。这可能是由于测量设备故障、实验设计要求或者人为错误等原因导致的数据点。在进行分析前,理解这些特殊缺失值所代表的含义非常重要,因为它们可能会影响数据分析和结果解释。

总之,在处理包含`.r`或`.d`数值型变量数据时,应当仔细审查数据集文档以了解这些标记的具体意义,并根据研究目的决定是否需要将它们转换为普通缺失值或其他类型的有效数据。

此文本由CAIE学术大模型生成,添加下方二维码,优先体验功能试用



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群