核密度估计的matlab toolbox

sheepyang1992

2775

收藏 2016-07-20

原文地址：http://www.ics.uci.edu/~ihler/code/kde.html

==============================================================================

MATLAB KDE Class Description &Specification

==============================================================================

TheKDE class is a general matlab class for k-dimensional kernel density

estimation. It is written in a mix of matlab ".m" files and MEX/C++ code.

Thus, to use it you will need to be able tocompile C++ code for Matlab.

Note that the default compiler for Windowsdoes *not* support C++, so you

will need GCC under Linux, or GCC or VisualC++ for Windows. Bloodshed

(http://www.bloodshed.net) supplies a nicedevelopment environment along

with the MinGW (http://www.mingw.org)compiler. See the page

http://gnumex.sourceforge.net/ for helpsetting up MEX with MinGW.

Kernels supported are: Gaussian,Epanetchnikov (truncated quadratic),

andLaplacian (Double exponential)

Formultivariate density estimates, the code supports product kernels --

kernels which are products of the kernel function in eachdimension.

Forexample, for Gaussian kernels this is equivalent to requiring a

diagonalcovariance.

Itcan also support non-uniform kernel bandwidths -- i.e. bandwidths

whichvary over kernel centers.

Theimplementation uses "kd-trees", a heirarchical representation for

point sets which caches sufficientstatistics about point locations etc.

in order to achieve potential speedups incomputation. For the Epanetchnikov

kernel this can translate into speedupswith no loss of precision; but for

kernels with infinite support it providesan approximation tolerance level,

which allows tradeoffs between evaluationquality and computation speed.

Inparticular, we implement Alex Gray's "Dual Tree" evaluationalgorithm;

see [Gray and Moore, "Very FastMultivariate Kernel Density Estimation using

via Computational Geometry", inProceedings, Joint Stat. Meeting 2003] for more

details. This gives a tolerance parameterwhich is a percent error (from the

exact, N^2 computation) on the value at anyevaluated point. In general,

"tolerance" parameters in thematlab code / notes refers to this percent

tolerance. This percentage error translatesto an absolute additive error on

the mean log-likelihood, for example.

Anexception to this is the gradient calcuation functions, which calculate

using an absolute tolerance value. This is due to the difficulty of finding

a percentage bound when the functioncalculated is not strictly positive.

Wehave also recently implemented the so-called Improved Fast Gauss Transform,

described in [Yang, Duraiswami, andGumerov, "Improved Fast Gauss Transform",

submitted to the Siam Journal of ScientificComputing]. This often performs

MUCH faster than the dual tree algorithmmentioned above, but the error bounds

which control the computation are oftenquite loose, and somewhat unwieldy

(for example, it is difficult to obtain thefractional error bounds provided &

used by the dual tree methods and otherfunctions in the KDE toolbox). Thus

for the moment we have left the IFGTseparate, with alternate controls for

computational complexity (see below, andthe file "evalIFGT.m").

==============================================================================

Getting Started

==============================================================================

Unzip the KDE class to a directory called @kde.

Compile the MEX functions. Thiscan be done by running "makemex" from

inside matlab, in the "@kde/mex" directory. If this fails, make sure that

MEX / C++ compilation works. TheKDE toolbox is tested in Matlab R13, but

apparently has problems in R12; I'm planning to investigate this.

NOTE: MS Visual C++ has a bug in dealing with "static const"variables; I

think there is a patch available, or you can change these to #defines.

Operate from the class' parent directory, or add it to your MATLAB path

(e.g. if you unzip to "myhome/@kde", cd in matlab to the"myhome" dir,

or add it to the path.)

Objects of type KDE may be created by e.g.

p= kde( rand(2,1000), [.05;.03] ); %Gaussian kernel, 2D

% BW = .05 in dim 1, .03 in dim 2.

p= kde( rand(2,1000), .05, ones(1,1000) ) % Same as above, but uniform BW and

% specifying weights

p= kde( rand(2,1000), .05, ones(1,1000), 'Epanetchnikov') % Quadratic kernel

%Just 'E' or 'e' also works

p= kde( rand(2,1000), 'rot' ); %Gaussian kernel, 2D,

% BW chosen by "rule of thumb"(below)

Tosee the kernel shape types, you can use:

plot(-3:.01:3,evaluate(kde(0,1,1,T),-3:.01:3) ); % where T = 'G', 'E', or 'L'

Kernel sizes may be selected automatically using e.g.

p= ksize(p, 'lcv'); % 1DLikelihood-based search for BW

p= ksize(p, 'rot'); % "Rule ofThumb"; Silverman '86 / Scott '92

p= ksize(p, 'hall'); % Plug-in typeestimator

Density estimates may be visualized using e.g.

plot(p);

mesh(hist(p));

Seehelp kde/plot and help kde/hist for more information.

Also, the demonstration programs @kde/examples/demo_kde_#.m may behelpful.

==============================================================================

KDE Matlab class definition

==============================================================================

The following is a simple list of allaccessible functions for the KDE class.

Constructors:

=====================================================

kde( ) : emptykde

kde( kde ) :re-construct kde from points, weights, bw, etc.

kde( points, bw ) :construct Gauss kde with weights 1/N

kde( points, bw, weights) :construct Gaussian kde

kde( points, bw, weights,type): potentially non-Gaussian

marginal( kde, dim) :marginalize to the given dimensions

condition( kde, dim, A) :marginalize to ~dim and weight by K(x_i(dim),a(dim))

resample( kde, [kstype] ) :draw N samples from kde & use to construct a new kde

reduce( kde, ...) :construct a "reduced" density estimate (fewer points)

joinTrees( t1, t2 ) :make a new tree with t1 and t2 as

the children of a new root node

Accessors: (data access, extremely limitedor no processing req'd)

=====================================================

getType(kde) :return the kernel type of the KDE ('Gaussian', etc)

getBW(kde,index) : returnthe bandwidth assoc. with x_i (Ndim xlength(index))

adjustBW : set thebandwidth(s) of the KDE (by reference!)

Note: cannot change from a uniform ->non-uniform bandwidth

ksize : automatic bandwidthselection via a number of methods

LCV : 1D searchusing max leave-one-out likelihood criterion

HALL : Plug-inestimator with good asymptotics; MISE criterion

ROT,MSP : Faststandard-deviaion based methods; AMISE criterion

LOCAL : Like LCV,but makes BW propto k-th NN distance (k=sqrt(N))

getPoints(kde) : Ndim x Npointsarray of kernel locations

adjustPoints(p,delta) : shift points of P by delta (by reference!)

getWeights : [1 x Npts]array of kernel weights

adjustWeights : setkernel weights (by reference!)

rescale(kde,alpha) : rescalea KDE by the (vector) alpha

getDim : get thedimension of the data

getNpts : get the #of kernel locations

getNeff :"effective" # of kernels (accounts for non-uniform weights)

sample(P,Np,KSType) : draw Np newsamples from P and set BW according to KSType

Display: (visualization / Description)

=====================================================

plot(kde...) : plot the specified dimensions ofthe KDE locations

hist(kde...) :discretize the kde at uniform bin lengths

display : text outputdescribing the KDE

double : booleanevaluation of the KDE (non-empty)

Statistics: (useful stats & operations on a kde)

=====================================================

covar : findthe (weighted) covariance of the kernel centers

mean : findthe (weighted) mean of the kernel centers

modes : (attemptto) find the modes of the distribution

knn(kde, points, k) : find the knearest neighbors of each of

points in kde

entropy : estimatethe entropy of the KDE

??? Maybe be able to specify alternate entropy estimates? Distance, etc?

kld : estimate divergencebetween two KDEs

ise : eval/estimateintegrated square difference between two KDEs

evaluate(kde, x[,tol]): evaluate KDE at a set of points x

evaluate(p, p2 [,tol]): "" "", x = p2.pts (if we've already built a tree)

evalIFGT(kde, x, N) : same asabove, but use the (very fast) Nth order improved

evalIFGT(p, p2, N) : Fast Gausstransform. Req's uniform-BW Gaussiankernels.

evalAvgLogL(kde, x) : computeMean( log( evaluate(kde, x) ))

evalAvgLogL(kde, kde2): "" "" but use the weights of kde2

evalAvgLogL(kde) : self-eval;leave-one-out option

llGrad(p,q) : find thegradient of log-likelihood for p

evaluated at the points of q

llHess(p,q) : find theHessian of log-likelihood of p at q

entropyGrad(p) : estimategradient of entropy (uses llGrad)

miGrad(p,dim) :"" for mutual information between p(dim), p(~dim)

klGrad(p1,p2) : estimategradient direction of KL-divergence

Mixture products: (NBP stuff)

=====================================================

GAUSSIAN KERNELS ONLY

productApprox : accessor for other product methods

prodSampleExact : sample Npoints exactly (N^d computation)

prodSampleEpsilon : kd-treeepsilon-exact sampler

prodSampleGibbs1 : seq. indexgibbs sampler

prodSampleGibbs2 : product ofexperts gibbs sampler

prodSampleGibbsMS1 :multiresolution version of GS1

prodSampleGibbsMS2 :multiresolution version of GS2

prodSampleImportance :importance sampling

prodSampleImportGauss : gaussianimportance sampling

productExact : exact computation (N^d kernel centers)

@kde.rar
大小:(1.85 MB)

马上下载

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群