Data-driven Bandwidth Selection for Gaussian Kernel

Usage

compute_bw_gaussian(Y)

Arguments

Y: A numeric matrix of dimension (n, p), where each column corresponds to the observed trajectory of a variable. Rows align with obs_time.

Value

A list of length p, where each element is a named list of the form list(bandwidth = <value>), containing the selected bandwidth for the corresponding variable.

Details

The bandwidth is set to the median over all pairwise distances among all sample points. When the number of possible pairs is large, a Monte Carlo resampling of 1,000 randomly selected pairs is used to approximate the median. This implementation adopts the bandwidth selection strategy proposed in the references below.

References

Mukherjee, S., Zhou, D. X., & Shawe-Taylor, J. (2006). Learning coordinate covariances via gradients. Journal of Machine Learning Research, 7(3). Yang, L., Lv, S., & Wang, J. (2016).

Model-free variable selection in reproducing kernel Hilbert space. Journal of Machine Learning Research, 17(82), 1-24.

Examples

set.seed(1)
obs_time <- seq(0, 1, length.out = 10)
Y <- cbind(sin(2 * pi * obs_time), cos(4 * pi * obs_time)) + 0.1 * matrix(rnorm(20), 10, 2)  # each col is a variable
compute_bw_gaussian(Y)
#> [[1]]
#> [[1]]$bandwidth
#> [1] 0.7237973
#> 
#> 
#> [[2]]
#> [[2]]$bandwidth
#> [1] 0.9341025
#> 
#>