原文地址:http://www.ics.uci.edu/~ihler/code/kde.html
==============================================================================
MATLAB KDE Class Description &Specification
==============================================================================
TheKDE class is a general matlab class for k-dimensional kernel density
estimation. It is written in a mix of matlab ".m" files and MEX/C++ code.
Thus, to use it you will need to be able tocompile C++ code for Matlab.
Note that the default compiler for Windowsdoes *not* support C++, so you
will need GCC under Linux, or GCC or VisualC++ for Windows. Bloodshed
(http://www.bloodshed.net) supplies a nicedevelopment environment along
with the MinGW (http://www.mingw.org)compiler. See the page
http://gnumex.sourceforge.net/ for helpsetting up MEX with MinGW.
Kernels supported are: Gaussian,Epanetchnikov (truncated quadratic),
andLaplacian (Double exponential)
Formultivariate density estimates, the code supports product kernels --
kernels which are products of the kernel function in eachdimension.
Forexample, for Gaussian kernels this is equivalent to requiring a
diagonalcovariance.
Itcan also support non-uniform kernel bandwidths -- i.e. bandwidths
whichvary over kernel centers.
Theimplementation uses "kd-trees", a heirarchical representation for
point sets which caches sufficientstatistics about point locations etc.
in order to achieve potential speedups incomputation. For the Epanetchnikov
kernel this can translate into speedupswith no loss of precision; but for
kernels with infinite support it providesan approximation tolerance level,
which allows tradeoffs between evaluationquality and computation speed.
Inparticular, we implement Alex Gray's "Dual Tree" evaluationalgorithm;
see [Gray and Moore, "Very FastMultivariate Kernel Density Estimation using
via Computational Geometry", inProceedings, Joint Stat. Meeting 2003] for more
details. This gives a tolerance parameterwhich is a percent error (from the
exact, N^2 computation) on the value at anyevaluated point. In general,
"tolerance" parameters in thematlab code / notes refers to this percent
tolerance. This percentage error translatesto an absolute additive error on
the mean log-likelihood, for example.
Anexception to this is the gradient calcuation functions, which calculate
using an absolute tolerance value. This is due to the difficulty of finding
a percentage bound when the functioncalculated is not strictly positive.
Wehave also recently implemented the so-called Improved Fast Gauss Transform,
described in [Yang, Duraiswami, andGumerov, "Improved Fast Gauss Transform",
submitted to the Siam Journal of ScientificComputing]. This often performs
MUCH faster than the dual tree algorithmmentioned above, but the error bounds
which control the computation are oftenquite loose, and somewhat unwieldy
(for example, it is difficult to obtain thefractional error bounds provided &
used by the dual tree methods and otherfunctions in the KDE toolbox). Thus
for the moment we have left the IFGTseparate, with alternate controls for
computational complexity (see below, andthe file "evalIFGT.m").
==============================================================================
Getting Started
==============================================================================
Unzip the KDE class to a directory called @kde.
Compile the MEX functions. Thiscan be done by running "makemex" from
inside matlab, in the "@kde/mex" directory. If this fails, make sure that
MEX / C++ compilation works. TheKDE toolbox is tested in Matlab R13, but
apparently has problems in R12; I'm planning to investigate this.
NOTE: MS Visual C++ has a bug in dealing with "static const"variables; I
think there is a patch available, or you can change these to #defines.
Operate from the class' parent directory, or add it to your MATLAB path
(e.g. if you unzip to "myhome/@kde", cd in matlab to the"myhome" dir,
or add it to the path.)
Objects of type KDE may be created by e.g.
p= kde( rand(2,1000), [.05;.03] ); %Gaussian kernel, 2D
% BW = .05 in dim 1, .03 in dim 2.
p= kde( rand(2,1000), .05, ones(1,1000) ) % Same as above, but uniform BW and
% specifying weights
p= kde( rand(2,1000), .05, ones(1,1000), 'Epanetchnikov') % Quadratic kernel
%Just 'E' or 'e' also works
p= kde( rand(2,1000), 'rot' ); %Gaussian kernel, 2D,
% BW chosen by "rule of thumb"(below)
Tosee the kernel shape types, you can use:
plot(-3:.01:3,evaluate(kde(0,1,1,T),-3:.01:3) ); % where T = 'G', 'E', or 'L'
Kernel sizes may be selected automatically using e.g.
p= ksize(p, 'lcv'); % 1DLikelihood-based search for BW
p= ksize(p, 'rot'); % "Rule ofThumb"; Silverman '86 / Scott '92
p= ksize(p, 'hall'); % Plug-in typeestimator
Density estimates may be visualized using e.g.
plot(p);
or
mesh(hist(p));
Seehelp kde/plot and help kde/hist for more information.
Also, the demonstration programs @kde/examples/demo_kde_#.m may behelpful.
==============================================================================
KDE Matlab class definition
==============================================================================
The following is a simple list of allaccessible functions for the KDE class.
Constructors:
=====================================================
kde( ) : emptykde
kde( kde ) :re-construct kde from points, weights, bw, etc.
kde( points, bw ) :construct Gauss kde with weights 1/N
kde( points, bw, weights) :construct Gaussian kde
kde( points, bw, weights,type): potentially non-Gaussian
marginal( kde, dim) :marginalize to the given dimensions
condition( kde, dim, A) :marginalize to ~dim and weight by K(x_i(dim),a(dim))
resample( kde, [kstype] ) :draw N samples from kde & use to construct a new kde
reduce( kde, ...) :construct a "reduced" density estimate (fewer points)
joinTrees( t1, t2 ) :make a new tree with t1 and t2 as
the children of a new root node
Accessors: (data access, extremely limitedor no processing req'd)
=====================================================
getType(kde) :return the kernel type of the KDE ('Gaussian', etc)
getBW(kde,index) : returnthe bandwidth assoc. with x_i (Ndim xlength(index))
adjustBW : set thebandwidth(s) of the KDE (by reference!)
Note: cannot change from a uniform ->non-uniform bandwidth
ksize : automatic bandwidthselection via a number of methods
LCV : 1D searchusing max leave-one-out likelihood criterion
HALL : Plug-inestimator with good asymptotics; MISE criterion
ROT,MSP : Faststandard-deviaion based methods; AMISE criterion
LOCAL : Like LCV,but makes BW propto k-th NN distance (k=sqrt(N))
getPoints(kde) : Ndim x Npointsarray of kernel locations
adjustPoints(p,delta) : shift points of P by delta (by reference!)
getWeights : [1 x Npts]array of kernel weights
adjustWeights : setkernel weights (by reference!)
rescale(kde,alpha) : rescalea KDE by the (vector) alpha
getDim : get thedimension of the data
getNpts : get the #of kernel locations
getNeff :"effective" # of kernels (accounts for non-uniform weights)
sample(P,Np,KSType) : draw Np newsamples from P and set BW according to KSType
Display: (visualization / Description)
=====================================================
plot(kde...) : plot the specified dimensions ofthe KDE locations
hist(kde...) :discretize the kde at uniform bin lengths
display : text outputdescribing the KDE
double : booleanevaluation of the KDE (non-empty)
Statistics: (useful stats & operations on a kde)
=====================================================
covar : findthe (weighted) covariance of the kernel centers
mean : findthe (weighted) mean of the kernel centers
modes : (attemptto) find the modes of the distribution
knn(kde, points, k) : find the knearest neighbors of each of
points in kde
entropy : estimatethe entropy of the KDE
??? Maybe be able to specify alternate entropy estimates? Distance, etc?
kld : estimate divergencebetween two KDEs
ise : eval/estimateintegrated square difference between two KDEs
evaluate(kde, x[,tol]): evaluate KDE at a set of points x
evaluate(p, p2 [,tol]): "" "", x = p2.pts (if we've already built a tree)
evalIFGT(kde, x, N) : same asabove, but use the (very fast) Nth order improved
evalIFGT(p, p2, N) : Fast Gausstransform. Req's uniform-BW Gaussiankernels.
evalAvgLogL(kde, x) : computeMean( log( evaluate(kde, x) ))
evalAvgLogL(kde, kde2): "" "" but use the weights of kde2
evalAvgLogL(kde) : self-eval;leave-one-out option
llGrad(p,q) : find thegradient of log-likelihood for p
evaluated at the points of q
llHess(p,q) : find theHessian of log-likelihood of p at q
entropyGrad(p) : estimategradient of entropy (uses llGrad)
miGrad(p,dim) :"" for mutual information between p(dim), p(~dim)
klGrad(p1,p2) : estimategradient direction of KL-divergence
Mixture products: (NBP stuff)
=====================================================
GAUSSIAN KERNELS ONLY
productApprox : accessor for other product methods
prodSampleExact : sample Npoints exactly (N^d computation)
prodSampleEpsilon : kd-treeepsilon-exact sampler
prodSampleGibbs1 : seq. indexgibbs sampler
prodSampleGibbs2 : product ofexperts gibbs sampler
prodSampleGibbsMS1 :multiresolution version of GS1
prodSampleGibbsMS2 :multiresolution version of GS2
prodSampleImportance :importance sampling
prodSampleImportGauss : gaussianimportance sampling
productExact : exact computation (N^d kernel centers)