Maybe you can get some idea form below:
GAUSS code example for multivariate/bivariate kernel density estimation
-----------------------------------------------------
new:
/* C21Mult.gss (Example 21.1.4) ** ** The purpose of the program is to demonstrate the calculation of kernel density ** estimates for multivariate probability distributions. The program generates a ** random sample of data from a bivariate Normal distribution and plots the kernel ** estimate of the true underlying PDF. The user can choose either a normal or ** Epanechnikov kernel function as well as the sample size (n) and the parameters ** of the normal distribution. */
/* Set some graphics and program parameters */ cls; graphset;
/* Load the graphics library */ library pgraph;
begin: print "Multivariate Kernel Density Estimation"; print; print " The purpose of the program is to demonstrate the calculation of kernel density"; print "estimates for multivariate probability distributions. The program generates a"; print "random sample of data from a bivariate Normal probability distribution where"; print "the user provides the parameters of the distribution and also chooses the"; print "sample size. Then for a 2-dimensional grid of values, the kernel density estimate"; print "of the population distribution is calculated based on a choice of kernel function"; print "and an estimate of optimal h. The user can choose either a normal kernel or"; print "an Epanechnikov kernel. The data is standardized to have unit variances, zero"; print "covariance, and zero means, and so what is being estimated is a centered and scaled"; print "version of the original population density. The centering and scaling is accomplished"; print "using estimates of the sample mean vector and sample covariance matrix."; print; print " Next, the user will choose either a normal or Epanechnikov kernel to use in"; print "calculating the kernel density estimate. The bandwidth used in the calculation"; print "will be an estimate of the optimal bandwidth presented in Equation (21.1.23) using"; print "a N(0, eye(2)) reference distribution."; print; print "Press any key to continue ..."; waitc;
kernelchoice: print "Please enter your choice of kernel function:"; print " (Please choose either 1 or 2)"; print; print "1. Epanechnikov (2/pi)*(1 - u'u) for u'u <= 1"; print "2. Gaussian (1/(2*pi))*exp(-0.5*u'u)"; print; kern = int(con(1,1)); if kern < 1 or kern > 2; cls; print "Please choose either 1 or 2."; goto kernelchoice; endif;
ncode: print "What sample size would you like to use?"; print " (Choose a value between 25 and 1000)"; n = int(con(1,1)); if n <25 or n > 1000 or not n/int(n) == 1; cls; print "Please choose an integer between 25 and 1000."; goto ncode; endif;
print; print "Estimating the Optimal Bandwidth"; print; print " In calculating the optimal bandwidth based on Equation (21.1.23), we need the value"; print "of the integral, over 2-dimensional real space, of the squared kernel function. You chose";
if kern == 1; print "the Epanechinikov kernel. The value of the integral of the square of this kernel equals 0.4217."; else; print "the Normal kernel. The value of the integral of the square of this kernel equals 0.0796."; endif;
print "This integral value could be calculated using one of the intquad2() procedures in GAUSS."; print "The integral value, over 2-dimensional real space, of the squared trace of the hessian"; print "matrix of the bivariate standard normal reference density (see Equation (21.1.23)) is 0.1592."; print "Also, the value of sigma2 associated with the kernel function (the sigma2 with a"; print "subscript K in Equation (21.1.12)), can be found using the intquad2() procedures,";
if kern == 1; print "as well. This takes the value 0.1509 for the bivariate Epanechnikov kernel function."; else; print "as well. This is simply 1 in the case of the bivariate Gaussian (standard normal) kernel."; endif; print; print " Therefore, making these substitutions into the definition of the optimal bandwidth";
if kern == 1; print "yields 2.48*n^(-1/6) for the optimal bandwidth value."; else; print "yields simply n^(-1/6) for the optimal bandwidth value."; endif;
print; print "Press any key to continue ..."; waitc;
print; print " We now use the kernel function chosen by the user and the estimate of the"; print "optimal bandwidth to define a kernel density estimate of the true N(0, eye(2))"; print "(standardized) probability density function."; print; print " The kernel density procedure Mkernel(xo, x, h, ktype) (see the procedure at"; print "the end of this program) is used to generate the kernel density estimate on a grid"; print "of evaluation points, and then the kernel density estimate is graphed along with"; print "a graph of the true N(0, eye(2)) (standardized) population density. The grid of"; print "evaluation points on which the graph is based is defined to span plus (+) or"; print "minus (-) two standard deviations around the distribution means. Given the"; print "standardized nature of the population distribution, the grid is defined on the"; print "Cartesian product set (-2, 2) X (-2, 2)."; print; print " The user can experiment with different kernel choices and sample sizes to"; print "examine the effects these changes on the kernel density estimates."; print; print " We now ask the user for the parameters of the bivariate normal distribution. We"; print "then generate bivariate random sample outcomes from this distribution, standardize"; print "the outcomes by subtracting their sample mean and premultiplying by the inverse of"; print "the transpose of the Cholesky factor for the estimated sample covariance matrix, and"; print "then calculate a kernel density estimate from the standardized data. Finally, GAUSS"; print "plots the results."; print; print "Press any key to continue ..."; waitc;
print; print "Parameters of the bivariate normal density function"; print; print " Please enter the two mean values for the bivariate"; print "normal random variable (X1, X2):"; print; means = con(2,1); print;
variances: print " Now enter the two variances for the bivariate random variables:"; vars = con(2,1);
if vars[1] <= 0 or vars[2] <= 0; cls; print "Variances must be positive! Please choose new variance values."; goto variances; endif; print;
correlation: print ""; print " Finally, choose the value of the correlation between X1 and X2:"; print " ( Please choose a value in the interval [-.9 , .9] )"; print ""; corr = con(1,1); if corr > .9 or corr < -.9; print "Your choice is outside of the designated range."; goto correlation; endif;
print; print /flush "Please wait -- the kernel density plot is being generated ...";
/* Form the covariance matrix */ cov = corr*sqrt(prodc(vars)); cov = (vars[1]~cov) | (cov~vars[2]);
EstimationLoop:
/* Generate the sample and form the true PDF */ x = rndn(n,2); xraw = (means') + x*cov; x = sortc(x,1); grid1 = seqa(-2, 4/50, 51); x = sortc(x, 2); grid2 = seqa(-2, 4/50, 51);
TrueGraph = zeros(2601,1); count = 1; for ii (1, 51, 1); i = ii; for jj (1, 51, 1); j = jj; TrueGraph[count] = pdfn(grid1)*pdfn(grid2[j]); count = count + 1; endfor; endfor; TrueGraph = reshape(TrueGraph', 51, 51); SampleMeans = meanc(xraw); SampleCov = vcx(xraw); xtrans = (xraw - SampleMeans')*inv(chol(SampleCov));
/* Now, generate the kernel density estimates of the population PDF */ if kern == 1; h = 2.48*n^(-1/6); else; h = n^(-1/6); endif;
/* Initialize the graph and counter variables */ KernelGraph = zeros(2601,1); count = 1;
for i (1, 51, 1); for j (1, 51, 1); xo = grid1~grid2[j]; KernelGraph[count] = Mkernel(xo, xtrans, h, kern); count = count + 1; endfor; endfor; KernelGraph = reshape(KernelGraph', 51, 51); "press any to continue, zhao"; waitc;
/* Plot the surface of the true and estimated densities */ graphset; _paxht = 0.25; _ptitlht = 0.2; _plwidth = 5; xlabel("x1-values"); ylabel("x2-values"); zlabel("Density");
title("True Standardized Density"); surface(grid1', grid2, TrueGraph);
if kern == 1; title("Kernel Function: Epanechnikov, h = " $+ ftos(h,"%*.*lf",3,2)); surface(grid1', grid2, KernelGraph); else; title("Kernel Function: Gaussian, h = " $+ ftos(h,"%*.*lf",3,2)); surface(grid1', grid2, KernelGraph); endif;
locate(22,1); print " Please close the Graphs by CLICKing X in the upper right"; print "corner of the graphics windows and press any key to continue ... "; waitc;
print; print "Would you like to choose a different kernel or sample size? "; print " Please enter Yes or No ..."; ans = cons; if (ans $== "Yes") or (ans $== "Y") or (ans $== "y") or (ans $== "yes"); cls; goto begin; endif;
ndpclex; end;
/* Procedure for the two-dimensional kernel function ** ** The required inputs are: ** xo the (1 x 2) point of evaluation ** x the (n x 2) matrix of observations ** h bandwidth ** ktype kernel section (1 = Epanechnikov, 2 = Gaussian) */ proc Mkernel(xo, x, h, ktype); local kernvals, u, points;
if h <= 0; print "Error: Bandwidth must be positive"; retp(0); elseif ktype < 1 or ktype > 2 or ktype/int(ktype) /= 1; print "Error: The kernel TYPE must be either 1 or 2"; retp(0); elseif cols(x) /= 2; print "Error: This procedure requires bivariate data."; retp(0); endif;
u = (xo - x)/h; points = sumc((u.*u)'); /* now the points is a nx1 vector each element value is seen as the distance between the observation and the xo */ if ktype == 1; kernvals = (2/pi)*(1 - points).*(points .<= 1); /* this says that only those observation whose distance to xo is less than 1 will be considered in the kernel weighting */ else; kernvals = (1/(2*pi))*exp(-0.5*points); /* with normal kernel function all observations will be used in evaluating the kernel function value, regardless their distance to point x0 */ endif; if count == 40; "point x0:";xo; "u = (xo - x)/h";u;?;?; "rows of u,"; rows(u);?;?; "columns of u"; cols(u);?;?; "rows of points";rows(points);?;?; "cols of points";cols(points);?;?; "points~x:";points~x;?;?; "kernalvals";kernvals;?;?; "sumc(kernvals)/(rows(x)*h*h)";sumc(kernvals)/(rows(x)*h*h);?;?; "press to continue...Procedure"; waitc; endif; retp(sumc(kernvals)/(rows(x)*h*h)); endp;
[此贴子已经被作者于2005-11-16 21:17:55编辑过]