全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 R语言论坛
6364 10
2006-08-17

各位大侠,不好意思,本人对非参数估计刚刚接触,还很不了解。有个问题想要请教:

用s-plus进行非参数估计的时候,窗宽和核函数的选择可以在软件中自动生成吗?还是得按照教材上讲的自己计算啊?

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2006-11-23 20:20:00
同上问...
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-3-20 22:02:00

好久没来,发现大家对这个问题不关注阿,开来真的没有多少人关心非参数问题,我也就没有可以询问学习的对象,好多疑问都不会

不过我问的这个问题我现在可以回答了,在软件中确实可以自动生成阿,我主要是用了惩罚最小二乘估计spline估计,平滑参数是可以自动生成的。核估计的我没有用,不知有没有自动生成,应该会有。

声明:我真的不懂非参数,错了的话请达人指出

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-3-21 08:25:00

可以自动生成,你可以设定那种方法用来计算宽窗

下面是R中关于密度估计的帮助文件

Kernel Density Estimation

Description

The (S3) generic function density computes kernel density estimates. Its default method does so with the given kernel and bandwidth for univariate observations.

Usage

density(x, ...) ## Default S3 method: density(x, bw = "nrd0", adjust = 1, kernel = c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine"), weights = NULL, window = kernel, width, give.Rkern = FALSE, n = 512, from, to, cut = 3, na.rm = FALSE, ...) 

Arguments

x the data from which the estimate is to be computed.
bw the smoothing bandwidth to be used. The kernels are scaled such that this is the standard deviation of the smoothing kernel. (Note this differs from the reference books cited below, and from S-PLUS.)
bw can also be a character string giving a rule to choose the bandwidth. See bw.nrd.
The specified (or computed) value of bw is multiplied by adjust.
adjust the bandwidth used is actually adjust*bw. This makes it easy to specify values like “half the default” bandwidth.
kernel, window a character string giving the smoothing kernel to be used. This must be one of "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine", with default "gaussian", and may be abbreviated to a unique prefix (single letter).
"cosine" is smoother than "optcosine", which is the usual “cosine” kernel in the literature and almost MSE-efficient. However, "cosine" is the version used by S.
weights numeric vector of non-negative observation weights, hence of same length as x. The default NULL is equivalent to weights = rep(1/nx, nx) where nx is the length of (the finite entries of) x[].
width this exists for compatibility with S; if given, and bw is not, will set bw to width if this is a character string, or to a kernel-dependent multiple of width if this is numeric.
give.Rkern logical; if true, no density is estimated, and the “canonical bandwidth” of the chosen kernel is returned instead.
n the number of equally spaced points at which the density is to be estimated. When n > 512, it is rounded up to the next power of 2 for efficiency reasons (fft).
from,to the left and right-most points of the grid at which the density is to be estimated.
cut by default, the values of left and right are cut bandwidths beyond the extremes of the data. This allows the estimated density to drop to approximately zero at the extremes.
na.rm logical; if TRUE, missing values are removed from x. If FALSE any missing values cause an error.
... further arguments for (non-default) methods.

Details

The algorithm used in density.default disperses the mass of the empirical distribution function over a regular grid of at least 512 points and then uses the fast Fourier transform to convolve this approximation with a discretized version of the kernel and then uses linear approximation to evaluate the density at the specified points.

The statistical properties of a kernel are determined by sig^2 (K) = int(t^2 K(t) dt) which is always = 1 for our kernels (and hence the bandwidth bw is the standard deviation of the kernel) and R(K) = int(K^2(t) dt).
MSE-equivalent bandwidths (for different kernels) are proportional to sig(K) R(K) which is scale invariant and for our kernels equal to R(K). This value is returned when give.Rkern = TRUE. See the examples for using exact equivalent bandwidths.

Infinite values in x are assumed to correspond to a point mass at +/-Inf and the density estimate is of the sub-density on (-Inf, +Inf).

Value

If give.Rkern is true, the number R(K), otherwise an object with class "density" whose underlying structure is a list containing the following components.

x the n coordinates of the points where the density is estimated.
y the estimated density values.
bw the bandwidth used.
n the sample size after elimination of missing values.
call the call which produced the result.
data.name the deparsed name of the x argument.
has.na logical, for compatibility (always FALSE).

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole (for S version).

Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.

Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683–690.

Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-3-21 16:49:00
在s中也可以自动生成,当然,亦可以自己编程。这主要看你的需要。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2007-3-25 16:11:00

在非参数回归或密度估计理论中,平滑参数或称为光滑参数的选择既是一个非常实际的问题,也是一个非常棘手的问题。

一般来说,光滑参数的选择是按照Cross-Validatioin(CV)或Generalized Cross Validation(GCV)方法来选取的(中文翻译为交互验证或广义交互验证)。其实质是找出预测效果最优的光滑参数值。

对于很多非参数回归或密度估计问题,交互验证的困难在于运算量很大,尤其是在样本量n很大的时候。

三次样条估计的运算量为线性的,即O(n),GCV的运算量也是线性的,因此在Splus、R等统计软件中给出了光滑参数的自动选择。

但在核估计方法中,核估计的运算量级为O(n^3)。更为不幸的是,由于问题本身的特点,其无法使用GCV,而CV的运算量级为O(n^4)。这对统计工作者或应用工作者都是无法接受的。

因此,核估计的光滑参数选择在任何一个标准统计软件中都永远不可能找到。

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群