This paper studies the estimating parameter of a nonparametric regression model that consists of the function of independent variables and observation of dependent variables. The smoothing spline, penalized spline, and B-spline methods in a class of smoothing techniques are considered for estimating the unknown parameter on nonparametric regression model. These methods use a smoothing parameter to control the smoothing performance on data set by using a cross-validation method. We also compare these methods by fitting a nonparametric regression model on simulation data and real data. The nonlinear model is a simulation data which is generated in two different models in terms of mathematical function based on statistical distribution. According to the results, the smoothing spline, the penalized spline, and the B-spline methods have a good performance to fit nonlinear data by considering the hypothesis testing of biased estimator. However the penalized spline method shows the minimum mean square errors on two models. As real data, we use the data from a light detection and ranging (LIDAR) experiment that contained the range distance travelled before the light as an independent variable and the logarithm of the ratio of received light from two laser sources as a dependent variable. From the mean square errors of fitting data, the penalized spline again shows the minimum values.
In statistical modelling, regression analysis is a statistical process for estimating parameters of the relationships between dependent and independent variables in terms of a regression function. However, regression analysis requires an assumption of the underlying regression function to be met. If an inappropriate assumption is used, it is possible to produce misleading results. To overcome this problem, the nonparametric regression is a choice to analyze data when the data are not meeting the assumption of regression analysis. The nonparametric regression is an alternative way for looking at scatter diagram smoothing to depict the relationship between dependent and independent variables. The single independent variable is called scatterplot smoothing it can be used to enhance the visual appearance to help our eyes pick out the trend in the plot.
The smoothing technique is a part of a method to estimate unknown parameters (trend or smoothing estimators) of nonparametric regression models. There are many popular smoothing techniques such as the smoothing spline [1, 2], the penalized spline , and the B-spline . The estimating parameters of these methods depend on the smoothing parameter which is controlled the trade-off between fidelity to the data and roughness of function. Smoothing Spline (SS) is a technique that estimates the natural polynomial spline by minimizing the penalized sum of squares based on a smoothing parameter. The penalized Spline (PS) smoother is approximated by minimizing the truncated power function on a low rank thin-plate spline depended on the smoothing parameter. The concept of the B-spline is similar to the smoothing spline and penalized spline. This requires the piecewise constant B-spline that can be obtained from truncated counterparts by differencing the B-spline function.
In this paper, we consider the nonparametric regression model in Section 2, and use the smoothing spline, penalized spline, and B-spline methods to estimate the unknown parameter of nonparametric regression model in Section 3. In Sections 4 and 5, we show the estimation of these methods for simulation data and real data. The conclusion is presented in Section 6.
2.The nonparametric regression model
The nonparametric regression model consists of the cubic spline of piecewise polynomials function based on a function of independent variables (), error process (), and dependent variables () following
The error process is assumed to follow the normal distribution with mean zero and variance one.
3.Method of smoothing techniques
The following smoothing techniques show the process to estimate parameters based on nonparametric regression model.
3.1Smoothing spline method
Wahba  defined the natural polynomial spline as a real-valued function on with the aid of so-called knots . The class of -order splines with domain will be denoted by .
The natural measure associated with the function that used to measure the roughness of curve which is called the quadratic penalty function given by
where is the th derivative of with respect to .
Consider the simple nonparametric regression model, to estimate minimizes over the class of function following
where 0 denotes a smoothing parameter. In this study, we emphasize 2 so-called the natural cubic spline which is commonly considered in the statistical literature .
The natural cubic spline is given the value and second derivatives at each knots as
Let be the vector and let be the vector .
The condition of natural cubic spline depends on two matrices and below
where , for , then is a matrix.
Matrix is a symmetric matrix with elements below
The matrix can be decomposed by
The roughness penalty will satisfy
To illustrate, it can be written in matrix form introduced by  as residual sum of squares (RSS)
where and . Letting be a matrix with and .
The roughness penalty term as in Eq. (5) to obtain
It therefore follows that Eq. (7) has a unique minimum, other smoothing spline estimator is obtained by
In this paper, we also select the smoothing parameter using the method of generalized cross-validation (GCV) suggested by Wahba  and Craven and Wahba . In practice, this step can be implemented by using the function of smooth.spline in the software R.
3.2Penalized spline method
in the range of interval , where . These locations are known as knots, and are called interior knots.
A regression spline can be constructed using the -th degree truncated power basis or called the B-spline basis with knots :
where denotes -th power of the positive part of where . The first basis functions of the truncated power basis Eq. (12) are polynomials of degree up to , and the others are all the truncated power functions of degree . A regression spline can be expressed as
where are the unknown coefficients to be estimated by a suitable loss minimization.
The penalized spline is a method to estimate a unknown smooth function using the truncated power function , and the penalized spline can be expressed as
where , and the th entry of is and only the coefficient of are penalized so that a reasonably large order can be used.
In this case, we focus 2, as the natural cubic spline, or called low-rank thin-plate spline which present of as
where is the vector of regression coefficients, and are fixed knots. The number of knots, can be selected using a cross-validation method or information theoretic methods (e.g., BIC or AIC).
This class of penalized spline smoothers may also be expressed as
and is a smoothing parameter. The penalized spline smoothers are estimated by using the SemiPar package in the software R.
B-splines are very interesting as a basic function for univariate independent variable of nonparametric regression function. De Boor  gave an algorithm to compute B-spline of lower degree on piece wise polynomials function.
The degree of B-spline function are evaluated from degree as
where basis of order with knots , and auxiliary knots . B-splines base on non-zero over domain spanned by at most knots. In this case, we focus the 4 or called the cubic B-spline with knots has basis expansion as
The nonparametric regression model can written in form of B-splines as
In matrix form, B-splines can be written in form a linear model
The B-splines estimators are approximated by least square problems as
The B-spline and penalties are studied by Eilers and Marx  that advocate the use of the equally spaced knots, instead of the order statistics of the independent variable. The B-spline coefficients can be estimated as
where is a banded matric which correspond to the difference penalty and denote by
The fitting cubic B-splines are . The smoothing parameter choosing by minimizing the ordinary function or the generalized cross-validation function.
The nonlinear data of this study is simulated in two models for estimating the performance of smoothing techniques based on independent variables which are considered in the class of uniform distribution. These models in the process of construction a curve on mathematical function, that show the best fit to a series of data points. Figures 1 and 2 show the scatter plot of and on models 1 and 2 with 50, 100, 200, 300 sample sizes.
The next step, the estimates of or called are approximated from smoothing spline (SS), penalized spline (PS), and B-spline (BS) that used to compute the bias and MSE of following
The data are generated and repeated for fitting the model 500 times. A -statistic is adopted to test that the mean of bias is equal the zero or called unbiased estimator. Tables 1 and 2 present the various summary statistics for the smoothing estimator obtained from three methods. The third and the fourth columns of these tables represent the sample mean and standard deviation of biases. The sample mean for the lower and upper bounds of the 95% confidence interval are given in the next two columns. The last two columns of these tables list the -statistic, and -values for hypothesis testing () that means when reject the estimator SS, PS, and BS with bias. The histogram of the bias estimator of SS, PS, and BS in model 1 are presented in Figs 3–5, and model 2 are presented in Figs 6–8.
From Tables 1 and 2, by observing the -values, the SS, PS, and BS provide asymptotically unbiased estimates for estimating parameter of nearly for all sample sizes of two models. From the -values for the two tables it is seen that are seen that the SS, PS, and BS of smoothing method have a good performance to fit data in a class of nonlinear data. From the histogram it is apparent that a standard deviation of relative biases increase with increasing sample sizes, so it makes the leptokurtic distribution. The average of MSE can answer the final question which smoothing method is the best estimator. Table 3 shows the average MSE for fitting 500 times on two models, and it can be seen that the PS method shows the minimum of average MSE for all sample sizes and models.
|Sample sizes||Methods||Model 1||Model2|
5.Application of real data
In this section, we consider the application of smoothing method based on SS, PS, and BS methods that we developed in the previous section. As the real data, we use the data frame which consists of 221 observations from a light detection and ranging (LIDAR) experiment. This data frame contains the range distance travelled before the light is reflected back to its source and logarithm of the ratio of received light from two laser sources as shown in the plot in Fig. 9.
After fitting the model, the estimating values play on a plot of light detection of ranging. It can be seen that the SS and PS interpolate in mass data more than the BS method that followed the MSE values such as SS 0.006016, PS 0.006010, and BS 0.009288. The minimum of MSE is the PS which is closed the SS as the result on Table 3.
In this section, we used the smoothing techniques of SS, PS, and BS methods based on nonparametric regression models. Through a Monte Carlo simulation study, we evaluated the smoothing estimator of SS, PS, and BS methods. For hypothesis testing based on the -value, the fitting values supported the null value, and showed that the smoothing estimators work reasonably well for all methods, but the PS shows the minimum of average MSE.
This work was supported by Faculty of Science Fund, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand.
Wahba G. Spline Models for Observational Data, SIAM: Philadelphia; 1990.
Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, Chapman and Hall: London; 1994.
Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression, Cambridge University Press: New Your; 2003.
Eilers PHC, Marx BD. Flexible Smoothing with B-splies and Penalties, Statistical Science 11(2) (1996), 89–102.
Wahba G. A survey of some smoothing problems and the method of generalized cross-validation for solving them, In Proceeding of the Conference on the Application of Statistic, 1976, pp. 507–523.
Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation, Numerische Mathematik 31 (1979), 377–403.
Eubank RL. Spline Smoothing and Nonparametric Regression, Marcel Dekker: New York; 1988.
Eubank RL. Nonparametric Regression and Spline Smoothing, Marcel Dekker: New York; 1999.
Ruppert D, Carroll RJ. Spatial-adaptive penalties for spline fitting Australian and New Zealand Journal of Statistics 42 (2000), 205–224.
De Boor C. A Practical Guide to Splines, Springer: Berlin; 1978.